A new look at atomic broadcast in the asynchronous. crash-recovery model



Similar documents
Fault tolerance in cloud technologies presented as a service

Recurrence. 1 Definitions and main statements

An Alternative Way to Measure Private Equity Performance

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

What is Candidate Sampling

Proactive Secret Sharing Or: How to Cope With Perpetual Leakage

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

Forecasting the Direction and Strength of Stock Market Movement

A Secure Password-Authenticated Key Agreement Using Smart Cards

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Minimal Coding Network With Combinatorial Structure For Instantaneous Recovery From Edge Failures

Borland & Tertiary Network - A Case Study in Success

Calculating the high frequency transmission line parameters of power cables

Mathematical Framework for A Novel Database Replication Algorithm

Conferencing protocols and Petri net analysis

Support Vector Machines

A Performance Analysis of View Maintenance Techniques for Data Warehouses

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) , info@teltonika.

Project Networks With Mixed-Time Constraints

J. Parallel Distrib. Comput.

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

The OC Curve of Attribute Acceptance Plans

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

1 Example 1: Axis-aligned rectangles

Analysis of Energy-Conserving Access Protocols for Wireless Identification Networks

An Interest-Oriented Network Evolution Mechanism for Online Communities

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

RequIn, a tool for fast web traffic inference

Hollinger Canadian Publishing Holdings Co. ( HCPH ) proceeding under the Companies Creditors Arrangement Act ( CCAA )

Extending Probabilistic Dynamic Epistemic Logic

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

Traffic State Estimation in the Traffic Management Center of Berlin

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

IWFMS: An Internal Workflow Management System/Optimizer for Hadoop

The University of Texas at Austin. Austin, Texas December Abstract. programs in which operations of dierent processes mayoverlap.

A Design Method of High-availability and Low-optical-loss Optical Aggregation Network Architecture

IMPACT ANALYSIS OF A CELLULAR PHONE

Efficient Striping Techniques for Variable Bit Rate Continuous Media File Servers æ

8 Algorithm for Binary Searching in Trees

Lecture 3: Force of Interest, Real Interest Rate, Annuity

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

EVALUATING THE PERCEIVED QUALITY OF INFRASTRUCTURE-LESS VOIP. Kun-chan Lan and Tsung-hsun Wu

Examensarbete. Rotating Workforce Scheduling. Caroline Granfeldt

Hybrid Log-based Fault Tolerant scheme for Mobile Computing System Yongning Zhai1, a, Zhenpeng Xu1, b, Weini Zeng1, c

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

denote the location of a node, and suppose node X . This transmission causes a successful reception by node X for any other node

Multiple-Period Attribution: Residuals and Compounding

2008/8. An integrated model for warehouse and inventory planning. Géraldine Strack and Yves Pochet

Research of concurrency control protocol based on the main memory database

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Enabling P2P One-view Multi-party Video Conferencing

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

A Probabilistic Theory of Coherence

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

Politecnico di Torino. Porto Institutional Repository

Energy Conserving Routing in Wireless Ad-hoc Networks

Testing Database Programs using Relational Symbolic Execution

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Time Value of Money Module

A Lyapunov Optimization Approach to Repeated Stochastic Games

A Hierarchical Reliability Model of Service-Based Software System

Stochastic Protocol Modeling for Anomaly Based Network Intrusion Detection

Application of Multi-Agents for Fault Detection and Reconfiguration of Power Distribution Systems

SUPPLIER FINANCING AND STOCK MANAGEMENT. A JOINT VIEW.

Auditing Cloud Service Level Agreement on VM CPU Speed

All Roads Lead to Rome: Optimistic Recovery for Distributed Iterative Data Processing

Checkng and Testng in Nokia RMS Process

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Using Series to Analyze Financial Situations: Present Value

Reliable State Monitoring in Cloud Datacenters

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

A Dynamic Energy-Efficiency Mechanism for Data Center Networks

Implementation of Deutsch's Algorithm Using Mathcad

A Self-Organized, Fault-Tolerant and Scalable Replication Scheme for Cloud Storage

= (2) T a,2 a,2. T a,3 a,3. T a,1 a,1

DEFINING %COMPLETE IN MICROSOFT PROJECT

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

A Cost-Effective Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems

A Cluster Based Replication Architecture for Load Balancing in Peer-to-Peer Content Distribution

Dynamic Pricing for Smart Grid with Reinforcement Learning

Availability-Based Path Selection and Network Vulnerability Assessment

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

An agent architecture for network support of distributed simulation systems

A role based access in a hierarchical sensor network architecture to provide multilevel security

Towards Specialization of the Contract-Aware Software Development Process

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Cloud Auto-Scaling with Deadline and Budget Constraints

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

Transcription:

A new look at atomc broadcast n the asynchronous crash-recovery model Sergo Mena André Schper École Polytechnque Fédérale de Lausanne (EPFL) Dstrbuted Systems Laboratory CH-1015 Lausanne, Swtzerland Tel.: +41-21-693-5354, Fax.: +41-21-693-6770 {sergo.mena andre.schper}@epfl.ch Techncal Report IC/2004/101 Abstract Atomc broadcast n partcular, and group communcaton n general, have manly been specfed and mplemented n a system model where processes do not recover after a crash. The model s called crash-stop. The drawback of ths model s ts nablty to express algorthms that tolerate the crash of a majorty of processes. Ths has led to extend the crash-stop model to the so-called crash-recovery model, n whch processes have access to stable storage, to log ther state perodcally. Ths allows them to recover a prevous state after a crash. However, the exstng specfcatons of atomc broadcast n the crash-recovery model are not satsfactory, and the paper explans why. The paper also proposes a new specfcaton of atomc broadcast n the crash-recovery model that addresses these ssues. Specfcally, our new specfcaton allows to dstngush between a unform and a non-unform verson of atomc broadcast. The non-unform verson logs less nformaton, and s thus more effcent. The unform and non-unform atomc broadcast have been mplemented and compared wth a publshed atomc broadcast algorthm. Performance results are presented. Keywords: Dstrbuted systems, atomc broadcast, crash-recovery model, group communcaton, fault tolerance

1 Introducton Atomc broadcast (also called total order broadcast) s an mportant abstracton n fault tolerant dstrbuted computng. Atomc broadcast ensures that messages broadcast by dfferent processes are delvered by all destnaton processes n the same order [8]. Many atomc broadcast algorthms have been publshed n the last twenty years [6]. Almost all of these algorthms have been developed n a model where processes do not have access to stable storage, a model that has been called crash-stop (or crash no-recovery). In such a model, a process that crashes loses all ts state; upon recovery t cannot be dstngushed from a newly startng process. The crash-stop model s attractve from an effcency pont of vew: snce loggng to stable storage s a costly operaton, atomc broadcast algorthms that do not log any nformaton are sgnfcantly more effcent than atomc broadcast algorthms that access stable storage. However, atomc broadcast algorthms n the crash-stop model also have drawbacks: they tolerate only the crash of a mnorty of processes. Moreover, there are contexts where access to stable storage s natural, e.g., database systems. It has been shown that replcated database systems can beneft from atomc broadcast [16, 11, 2], but atomc broadcast n the crash-stop model does not sut ths context [17]. For ths reason, there s a strong motvaton to consder atomc broadcast n a model where processes have access to stable storage, a model that has been called crash-recovery. In ths model, processes have access to stable storage to save part of ther state: a process that recovers after a crash can retreve ts latest saved state, and restart computaton from there on. Because of the strong lnk between consensus and atomc broadcast (f one problem s solvable, the other s also solvable), the more basc of these two problems, namely consensus, needs to be addressed frst. Among the papers that address crash-recovery consensus [13, 10, 1], we hghlght the work by Agulera et al [1]. They defne a new falure detector for the crash-recovery model and propose two algorthms for solvng consensus n that model. Based on ths result, Rodrgues and Raynal address the problem of atomc broadcast n the crash-recovery model [14]. Whle ths paper advances the state-of-the-art, t has a few weaknesses. From our pont of vew, the man problem n [14] s the specfcaton of atomc broadcast. Classcally, atomc broadcast s specfed n terms of the two prmtves abcast (to broadcast a message) and adelver (to delver a message). No adelver prmtve appears n [14], where adelver s a predcate. The value true / false of the predcate depends on a sequence of messages called adelversequence. The applcaton has to poll ths sequence for newly adelvered messages. Ths shows a problem n the specfcaton. A more mportant mplcaton s that the specfcaton n [14] does not 1

reduce to the classcal specfcaton of atomc broadcast n the crash-stop model [8] when crashed processes do not recover. We pont out another lmtaton of the work n [14]. The work only addresses unform atomc broadcast. Non-unform atomc broadcast n the crash-recovery model s an alternatve that may be very nterestng from a practcal pont of vew. Non-unform atomc broadcast can be seen as an ntermedate soluton, between (1) an atomc broadcast algorthm n the crash-stop model that does not access stable storage at all, and (2) a unform atomc broadcast algorthm n the crash-recovery model that s expensve due to frequent accesses to stable storage. In contrast to [14], we propose both a unform and a non-unform verson of atomc broadcast. The non-unform atomc broadcast algorthm does not requre frequent access to the stable storage. Interestngly, our two specfcatons reduce to the classcal specfcaton of atomc broadcast n the crash-stop model [8] when crashed processes do not recover. We also explan why atomc broadcast n the crash-recovery model s trcker than n the crash-stop model. Atomc broadcast s most of the tme used wthn an applcaton that has a state. The atomc broadcast algorthm also has a state. Upon recovery both states must be consstent. However, wth no recovery ths s not a problem! Ths becomes a problem n the crash-recovery model. We show how the consstency ssue s addressed both n our specfcaton and n our mplementaton. Fnally, we have run experments that show the gan n performance of the non-unform verson of our atomc broadcast algorthm wth respect to the unform verson. The rest of the paper s organzed as follows. Secton 2 s devoted to the specfcaton of unform and non-unform atomc broadcast n the crash-recovery model. Secton 3 dscusses the problem of keepng the applcaton state consstent wth the state of the atomc broadcast algorthm. Secton 4 presents the two algorthms that satsfy our unform and non-unform specfcaton of atomc broadcast. Secton 5 s devoted to performance evaluaton. Fnally, Secton 6 concludes the paper. 2 Specfcaton of Atomc Broadcast n the crash-recovery model 2.1 The crash-recovery system model We consder a system wth a fnte set of processes Π = {p 1,p 2,...,p n }. The system s asynchronous, whch means that there s no assumpton on message transmsson delays or relatve speed of processes. The system s statc, whch means that the set Π of processes never changes after system start-up tme. Durng system lfetme, processes can take nternal steps or communcate by means of 2

message exchange. Crash and recovery: Processes can crash and may subsequently recover. We consder system startup tme as an mplct recover event. In any process hstory, a recover event happens always mmedately after a crash event, except for system start-up tme. Moreover, the only event that can happen mmedately after a crash event (f any) s a recover event. Up and down: A process q s up wthn the segments of ts hstory between a recover event and the followng crash event. If no crash event occurs after the last recover event n q s hstory, then q s up forever from ts last recover event on. In ths case, we say q s eventually always up. A process q s down wthn the segments of ts hstory after a crash event untl the next recover event (f such an event exsts). Good and bad processes: A process s good f t s eventually always up. A process s bad f t s not good. In other words, a process s bad f t (a) eventually crashes and never recovers or (b) crashes and recovers nfntely often. 2.2 Defntons As usual, we defne atomc broadcast wth an abcast and an adelver prmtve. We say that process q abcasts message m f q executes abcast(m); we say that process q adelvers message m f q executes adelver(m). To these two prmtves, we add a thrd commt prmtve. Roughly speakng, the commt prmtve executed by q marks the pont at whch q s executon wll resume after a crash. When commt s executed by q, all messages prevously adelvered at q wll never be adelvered agan at q, evenfq crashes and recovers. The commt prmtve addresses the fundamental process state problem n the crash-recovery model. The state of each process q s splt nto two parts: (1) the applcaton state, and (2) the atomc broadcast protocol state. The dstncton between these two states can be gnored when processes do not recover after a crash, but not here. We wll come back to ths ssue later. Wth the commt prmtve, we ntroduce the followng termnology: 1. Process q ab-commts message m f (1) q abcasts m, (2) q executes the prmtve commt() later on, and (3) q does not crash n-between. 2. Process q del-commts message m f (1) q adelvers m, (2) q executes the prmtve commt() later on, and (3) q does not crash n-between. 3

We also ntroduce the noton of permanent and volatle event. In our model, events are crash, recover, and the executon of the prmtves presented above. If process q crashes, ts volatle events are those that may be lost; q s permanent events are those that are not lost even f q crashes. So f q never crashes, ts whole hstory s permanent. More formally, the set V q of volatle events and the set P q of permanent events partton q s hstory, denoted by h q. An event e h q belongs to V q f (1) a crash event e c occurs after e n h q, and (2) no commt event occurs n h q between e and e c. An event e h q belongs to P q f t does not belong to V q. We ntroduce some addtonal defntons that wll be used n the specfcaton of atomc broadcast: Non-recovery runs: Let R Π be the set of all possble runs allowed n the crash-recovery model for process set Π. We defne non-recovery runs to be the set N Π R Π of runs that do not contan any commt event or any recover event other than system start-up tme. N Π s the set of all possble runs n the well-known crash-stop model for process set Π. Permanent broadcast: We say that process q permanently abcasts (or smply q p-abcasts) message m f (a) q ab-commts m, or (b) q abcasts m and does not crash later. In other words, q permanently abcasts message m f event abcast(m) belongs to set P q of q s permanent events. Permanent delvery: Lkewse, we say that process q permanently adelvers (or smply q p- adelvers) message m f (a) q del-commts m, or (b) q adelvers m and does not crash later. In other words, q permanently adelvers message m f event adelver(m) belongs to set P q of q s permanent events. Delvery order: We say that process q adelvers message m before m f (a) q adelvers m and later m and does not crash n-between, or (b) q adelvers m after havng del-commted m. Notaton: m q m. As a result, f q crashes between the adelvery of m and m, these two messages may not be ordered. Permanent delvery order: We say that process q p-adelvers message m before m f (1) m q m holds, and (2) q p-adelvers m. Notaton: m q m. 1 Multple delvery: We say that process q adelvers message m more than once f we have m q m. As a result, f q adelvers m twce but crashes n-between, then m s not necessarly consdered as adelvered more than once. 1 If m qm holds, t s easy to see that q also p-adelvers m. 4

2.3 Specfcaton of atomc broadcast We can now formally defne atomc broadcast. As n the crash-stop model, we dstngush between unform and non-unform atomc broadcast, a dstncton that could not be made n the specfcaton proposed by Rodrgues and Raynal n [14]. A frst attempt to ntroduce ths dstncton was made n [4] n the context of Relable Broadcast, 2 but the lack of a prmtve lke commt does not lead to a convncng specfcaton. In our specfcaton, unform atomc broadcast constrans the behavor of good and bad processes, whle non-unform atomc broadcast does not mpose any constrants on (1) bad processes, and (2) volatle events of good processes. In other words, non-unform atomc broadcast gnores volatle events, and consders only permanent events,.e., the events that good processes remember once they stop crashng. An mportant feature of our specfcaton of unform and non-unform atomc broadcast s that, n non-recovery runs (see Sect. 2.2), our new specfcaton reduces exactly to the classcal defnton of unform and non-unform atomc broadcast [8]. We defne (non-unform) atomc broadcast by the propertes Valdty (1), Unform Integrty 3 (2), Agreement (4), and Total Order (6) defned below. We defne unform atomc broadcast by the propertes: Valdty (1), Unform Integrty (2), Unform Agreement (3), and Unform Total Order (5). 1. Valdty: If a good process q p-abcasts m then q p-adelvers m. There s no unform Valdty property, snce t does not make sense to requre from a bad process, whch can crash and never recover, to delver m. So the Valdty property s the same for unform and non-unform atomc broadcast. 2. Unform Integrty: For every message m, every process q adelvers m only f some process has abroadcast m. Moreover, m q m never holds for any process q. Ths property allows a process to adelver the same message twce (under certan condtons), unlke Unform Integrty n the crash-stop model (see the defnton of multple delvery, Sect. 2.2). For nstance, f process q adelvers message m and then crashes before del-commttng m, Unform Integrty allows q to adelver m agan after recovery. 3. Unform Agreement: If a process (good or bad) adelvers message m, then every good process p-adelvers m. 2 Relable Broadcast s weaker than Atomc Broadcast: t does not enforce any order n message delvery. 3 We do not defne a Non-unform Integrty property, whch does not make much sense from a practcal pont of vew. 5

Ths property requres that all good processes permanently adelver any message that s adelvered by some process. The requred permanent delvery of m ensures that a good process q remembers havng adelvered m at the tme q stops crashng. 4. Agreement: If a good process p-adelvers message m, then every good process p-adelvers m. Non-unform Agreement only puts a constrant on messages p-adelvered by good processes. There s no constrant on a message adelvered (but not p-adelvered) by a process that later crashes. Also, there s no constrant on a message p-adelvered by a bad process. In the two cases, no process remembers m. 5. Unform Total Order: Let p and q be two processes (good or bad). If m p m holds and q adelvers m, then m q m also holds. 6. Total Order: Let p and q be two good processes. If m p m holds and q adelvers m, then m q m also holds. The ntroducton of the commt prmtve s fundamental n our specfcaton. It allows us to dstngush between volatle and permanent events. Wth ths dstncton t s farly easy to defne unformty and non-unformty n the crash-recovery model (and t would be hard to ntroduce the dstncton wthout the commt prmtve). In addton, wth the commt prmtve, t s not the mplementor of atomc broadcast who decdes when to make events permanent. Ths s left to the applcaton, whch knows better when volatle events are no more nterestng (e.g., because an applcaton checkpont was taken) and should thus become permanent. Moreover, t s easy to see that, f crashed processes never recover, then our specfcaton of unform and non-unform atomc broadcast corresponds exactly to the standard specfcaton of unform and non-unform atomc broadcast n the crash-stop model. The non-unform specfcaton can be crtcsed wth the argument that a bad process p (e.g., a process that crashes and recovers nfntely often) can behave arbtrarly, even f t executes commt a number of tmes. However, p cannot know whether t s good or bad because t may recover n the future and stay up forever. Ths s smlar to the crash-stop model, where a process that crashes n the future s faulty and can thus behave arbtrarly. The practcal relevance of non-unformty s dscussed n Secton 4.3. 6

2.4 Related work Atomc broadcast has been specfed n the crash-recovery model by Rodrgues and Raynal [14]. They defne the prmtve abcast(m) (abroadcast of m) and the sequence µ p = adelver-sequence(). Moreover, adelver(m) s a predcate that s true ff m adelver-sequence() at p. Atomc broadcast s then specfed by the followng propertes: Valdty: If a process adelvers a message m, then some process has abroadcast m. Integrty: Let µ p be the delvery sequence at process p. Any message appears at most once n µ p. Termnaton: For any message m, (1) f the process that ssues abcast(m) returns from abroadcast(m) and s a good process, or (2) f a process adelvers message m, then all good processes adelver m Total order: Let µ p = adelver-sequence() at process p. For any par of processes (p, q), ether µ p s a prefx of µ q or vceversa. Ths specfcaton has several problems. The man one s the absence of an adelver prmtve: how s the adelver-sequence defned? Ths s the trcky ssue that s not addressed. In the group communcaton lterature, specfcatons usually defne the adelver prmtve frst, and then the adelversequence as the sequence of messages adelvered. It s the opposte that s done: adelver s defned based on the adelver-sequence, and the adelver-sequence s not defned. Therefore the followng statement n the paper s a tautology: process p adelvers m f adelvered(m, adelver-sequence()) s true at p Moreover, because of the absence of an adelver prmtve, the specfcaton does not reduce to the standard specfcaton of atomc broadcast n the crash-stop model. As a result, all propertes derved from the crash-stop model have to be renvented. The authors of [14]also menton the followng problem wth the Valdty property. If the call to abcast(m) returns at a good process p, ths forces all good processes to eventually adelver m. They argue that ths s problematc to ensure f the process crashes shortly after havng called abcast(m). In contrast, our specfcaton uses the commt prmtve, whch avods the problem. Our Valdty property only forces good processes to eventually adelver message m f p permanently abcasts m (e.g., p abcasts m and then executes commt). Fnally, Rodrgues and Raynal also propose an optmzed mplementaton of atomc broadcast. In ths mplementaton, atomc broadcast checkponts the state of the applcaton from tme to tme. We clam that, usually, these checkponts should be ntated by the applcaton (whch knows best when 7

to checkpont ts own state), but ths can not be done n the mplementaton gven n [14]. The reason s that the specfcaton lacks a prmtve (lke commt) that the applcaton could use to ntate such a checkpont. 3 Keepng the process state consstent Atomc broadcast s commonly used to update the state of replcated servers. Consder a replca p. The state of p needs to be dstngushed from the state of the atomc broadcast stack local to p.we ntroduce the followng notaton: s appl denotes the applcaton state of p and s abcast of the atomc broadcast stack local to p. We assume here that s appl OS process denoted by p. The dstncton between the s abcast and s abcast state and the s appl denotes the state are part of the same state of p can be completely gnored n the crash-stop model. Ths s no more the case n the crash-recovery model, where p must recover n a state where s abcast and s appl are consstent. We now address ths problem. 3.1 Usage of commt We extend the notaton just ntroduced to denote by p appl the atomc broadcast code of p. In order to recover the state s appl s appl from tme to tme. After a crash, p appl the applcaton code of p, and by p abcast after a crash, p appl recovers n the most recently saved s appl checkponts state. From the pont of vew of p appl, the message delvery sequence should resume exactly where t was at the moment of the checkpont: the delvery must (1) not nclude any message logcally ncluded n s appl, 4 (2) but must not mss any message adelvered later n the logcal adelvery sequence. For example, consder the logcal adelvery sequence m 1,m 2,m 3.Ifp appl has checkponted ts state after the adelvery of m 1 and crashed after the handlng of m 2, then the delvery after recovery should restart wth m 2. The commt prmtve naturally fts ths requrement: process p appl checkponts s appl and then mmedately executes commt. Condton (1) above s guaranteed by the (Unform) Integrty property (whch ensures that no del-commted message wll be adelvered agan); condton (2) s ensured by the Agreement property. Ths soluton works as long as the checkpont and the commt operatons are executed atomcally, that s, a process can never crash between t1 and t2 n Fgure 1. Moreover, all events (depcted as crcles n Fg. 1) are assumed to be atomc so far. We now explan how these two assumptons can be relaxed, whle keepng s appl and s abcast consstent upon recovery. 4 A message m s logcally ncluded n the checkponted state f t led to the update of s appl. 8

checkpont commt() abcast(m1) abcast(m2) adelver(m1) t1 t2 checkpont commt() abcast(m3) adelver(m2) adelver(m2) abcast(m4) adelver(m4) ab-commted del-commted Fgure 1: Example executon. After p appl checkponts ts state, t calls commt. 3.2 Addressng the atomcty problem For a process p, we have ntroduced the dstncton between p appl and p abcast. The nteracton between p appl and p abcast s naturally expressed by means of functon calls (e.g., abcast functon, commt functon). Functon calls are synchronous: the caller blocks whle the call s beng executed. Ths yelds a useful property: the caller s sure that the callee has completely processed the call when t returns. Consder now that p crashes durng the functon call (e.g., durng abcast or commt). When p recovers, t does not know whether the functon was successfully executed or not. To address ths problem, we model the communcaton between p appl messages. When p appl and p abcast n terms of nvokes the prmtve F (PARAMETERS) on the atomc broadcast nterface, we say t sends the (local) message <F,PARAMETERS> to p abcast (see Fg. 2). Lkewse, when p abcast nvokes F (PARAMETERS ) on the applcaton nterface, we say t sends the (local) message <F, PARAMETERS > to p appl [17]. APPLICATION p appl m = <F, PARAMETERS> m' = <F', PARAMETERS'> ATOMIC BROADCAST p abcast Fgure 2: Functon calls and callbacks can be modeled as messages. When modellng ntra-process communcaton usng the message-passng model, a sngle process p becomes a dstrbuted system wth two processes p appl and p abcast. If we represent Fgure 1 usng ths message-passng model, t becomes Fgure 3. The atomcty problem now becomes the problem 9

to recover p appl and p abcast n a consstent global state. applcaton checkpont A commt() abcast(m1) abcast(m2) checkpont B commt() abcast(m3) abcast(m4) atomc broadcast checkpont A' adelver(m1) t1 t2 checkpont B' t3 adelver(m2) adelver(m2) adelver(m4) Fgure 3: Expressng Fgure 1 usng message passng communcaton. Ths modellng allows us now to apply results from the checkpontng lterature [7]. A message m becomes orphan when ts sender s rolled-back to a state before the sendng of m (m s unsent), but the state of ts recever stll reflects the recepton of m. In ths case, the recever s sad to be an orphan process. Orphan processes cannot be tolerated: the orphan process needs to be rolled-back (even f t dd not crash). A message m s n-transt when ts recever s rolled-back to a state before the recepton of m (m s unreceved), whle the sender s n a state n whch m was sent. In-transt messages are tolerated under the condton that the rollback-recovery protocol s bult on top of lossy channels [7]. In our model communcaton s relable, so we cannot recover n a state wth n-transt messages. Therefore, n-transt messages cannot be tolerated, ether. We now dscuss how to recover p appl and p abcast n a global state wth no orphan and no n-transt messages. We explan the soluton on Fgure 3. Consder the second checkpont-commt par. We only have to dstngush three cases: (1) crash at t1,.e., before the checkponts B and B, (2) crash at t2,.e., after checkpont B but before checkpont B, and (3) crash at t3, after the checkponts B and B. Cases (1) and (3) are analog: no n-transt and orphan messages n the global state (A, A ) and n the global state (B,B ). Thus, we only dscuss case (3), whch s depcted n Fgure 3. Snce there s no n-transt message n the global state (B,B ), p appl s rolled back to the checkpont B. s rolled back to the checkpont B and p abcast In case (2), the global state (B,A ) contans at least one n-transt message (the commt), and so p cannot be rolled-back to ths state. So p appl rolled-back to checkpont A. s forced to rollback to checkpont A and p abcast s So the only problem s to know whether case (2) or (3) occurs. Ths can easly be done by 10

countng the number of s appl checkponts and the number of s abcast checkponts. If the two numbers are equal we are n case (3). Otherwse, we are n case (2). Algorthm 1 shows the correspondng pseudo-code. At the atomc broadcast level,.e., p abcast, the varable nb commts counts the number of commts executed so far. Its value s logged wth the data that the commt procedure logs, so t really reflects the number of commts executed despte crashes. At the applcaton level,.e., p appl, the array st represents the sequence of checkponts of s appl. 5 The varable nb checks keeps track of the number of checkponts done so far. It s mportant that no message s adelvered durng the checkpontng phase (lnes 4 through 8). Upon recovery, st s retreved and the value nb checks s computed (lne 10). Then, p appl queres p abcast to fnd out whether (a) t can resume executon from ts very last checkpont, or (b) t has to roll back to the prevous checkpont. Note that actually p appl only keeps the two most recent checkponts. Algorthm 1 Keepng local consstency between atomc broadcast and the applcaton. 1: At applcaton level 2: Intalsaton: 3: nb checks 0; Æ : st[] 14: At atomc broadcast level 15: Intalsaton: 16:...;nb commts 0;... 4: to checkpont(state) 5: nb checks nb checks +1 6: st[nb checks] state 7: st[nb checks 2] 8: log(st); abcast.commt() 9: upon recovery do 10: retreve(st); nb checks max{ : st[] } 11: f abcast.get nb commts() <nbchecks then 12: st[nb checks] ; log(st) 13: nb checks nb checks 1 17: procedure get nb commts() 18: return(nb commts) 19: upon commt() do 20:...;nb commts nb commts +1 21: log nb commts together wth other data;... 22: upon recovery do 23:...;retreve(nb commts);... 4 Solvng unform and non-unform atomc broadcast There are several alternatves to solvng atomc broadcast n the crash-recovery model, n the same way as there are varous algorthms that have been proposed to solve t n the crash-stop model [6]. In ths secton, we have chosen to llustrate how to mplement the new atomc broadcast specfcatons of Secton 2 by reducton to a sequence of consensus. Ths technque s well accepted n the crashstop model, whch justfes our choce. We frst present an algorthm that mplements unform atomc broadcast n the crash-recovery model. Then, we dscuss how to convert ths algorthm nto a more effcent one that satsfes the weaker non-unform atomc broadcast specfcaton. Both algorthms 5 Representng ths as an nfnte sequence smplfes the presentaton. 11

solve the problem by reducton to consensus. 4.1 Buldng blocks The algorthms we present below rely on the followng buldng blocks. Loggng. Durng normal executon, processes use non-persstent memory to keep ther state. They access stable storage from tme to tme to save data from non-persstent memory. When a process crashes and later recovers, only the data saved to stable storage s avalable for retrevng. A process uses functon log(x) to log the content of varable X to stable storage, and the functon retreve(x) to retreve (upon recovery) the prevously logged value of X. These two functons are very costly and should be used as sparsely as possble. Far-lossy channels. Processes communcate usng channels. Because of the crash-recovery model, we cannot assume relable channels. Indeed, consder processes p and q: fp sends a message m to q whle q s down, the channel cannot delver m to q. So we assume far-lossy channels and the two communcaton prmtves: send(message) to destnaton and receve(message) from source. They ensure the followng property: f p sends an nfnte number of messages to q and q s good, then q receves an nfnte number of messages from p. Far-lossy channels can be mplemented wthout access to stable storage. Consensus. The algorthms below solve atomc broadcast by reducton to consensus,.e., we need a buldng block that solves consensus. In consensus, each process proposes a value, and (1) all good processes decde a value, (2) ths value s the same for all processes that decde, 6 and (3) t s the ntal value proposed by some process. Secton 4.4 dscusses how to solve consensus. 4.2 Unform atomc broadcast Overvew. Algorthm 2 mplements the unform varant of our atomc broadcast specfcaton. The algorthm reduces atomc broadcast to a sequence of consensus as n [5] for the crash-stop model. It s also nfluenced by the algorthms n [14] (whch are actually derved from [5]). The algorthm has two tasks: the sequencer task and the gossp task. The sequencer task executes a sequence of consensus to decde on the delvery order of messages, whle loggng every value proposed to stable storage. The 6 Actually, ths defnes unform consensus. In ths paper, consensus always stands for unform consensus. Note that the specfcaton of non-unform consensus n the crash-recovery model [1] s not well-adapted for ths work. 12

gossp task s responsble for dssemnatng new messages among all processes. Ths s necessary to ensure eventual message recepton wth far-lossy channels. When commt s executed, the algorthm also logs the part of ts state that s necessary n the case of a crash followed by a recovery. Upon recovery, the algorthm replays (see lnes 11 to 16) all messages adelvered beyond the most recent commt executed before the crash. Ths s needed n order to satsfy the specfcaton of unform atomc broadcast. Innovatons. Snce the basc dea of the atomc broadcast algorthm s nspred by [5] and [14], we found t napproprate to explan the detals. More nterestng s to focus on the dfferences. Thus, we now explan the man dfferences between Algorthm 2 and the algorthm presented n [14] (and we also pont out a bug n ths algorthm, see below). These dfferences are marked wth grey background (e.g., lne 6, lne 10, etc.). The rectangles surroundng part of the code should be gnored for the moment (e.g., lnes 11 to 16): they are dscussed n Secton 4.3. Algorthm 2 Solvng unform atomc broadcast. The non-unform algorthm s obtaned by removng the code nsde the whte boxes. 1: For every process p 2: Intalsaton: 3: Æ : P roposed[] 4: Unord ; A delv 5: k 0; gossp k 0 6: nb commts 0 7: procedure process decson(decson) 8: result decson \ A delv 9: A delv A delv :: result 10: adelver(result) ; k k +1 22: upon A-broadcast(m) do 23: Unord Unord {m} 24: upon commt() do 25: nb commts nb commts +1 26: log(k, A delv, Unord, nb commts) 27: upon receve(k, Unord ) from q do 28: Unord Unord Unord \ A delv 29: gossp k max(gossp k, k ) 11: procedure replay() 12: whle P roposed[k] do 13: Unord Unord P roposed[k] 14: propose(k, P roposed[k]) 15: wat untl decde(k, decson) 16: process decson(decson) 17: upon ntalzaton or recovery do 18: retreve(k, A delv,unord, nb commts) 19: fork task(gossp) 20: retreve(p roposed); replay() 21: fork task(sequencer) 30: task Gossp 31: repeat forever 32: send(k, Unord) to all 33: task Sequencer 34: repeat forever 35: wat untl Unord or gossp k>k 36: P roposed[k] Unord 37: log(p roposed[k]) 38: propose(k, P roposed[k]) 39: wat untl decde(k, decson) 40: process decson(decson) 41: Unord Unord \ A delv The only new varable s nb commts (lne 6), whch counts the number of commts locally performed snce system start-up tme (see Secton 3). Ths varable s accessed n lnes 18, 25, and 26. Lnes 24 through 26 are executed upon commt. Commt saves to stable storage all data necessary 13

to restore ts state upon recovery. These data are (1) the number of the current nstance of consensus (or the next one, f there s no consensus runnng at the local process), (2) the varable A delv contanng messages already adelvered, (3) the set Unord of messages receved but not yet adelvered, 7 and (4) the varable nb commts defned above. The rest of the state s ether not needed: varable gossp k, or logged elsewhere: the array P roposed (values proposed to consensus). Note that, unlke [14], our algorthm does nclude the prmtve adelver (see Secton 2). Adelver occurs every tme a message s added to set A delv,.e., n lne10. Upon recovery, procedure Replay proposes agan the ntal values that were proposed before the crash. It does so n lne 14. Ths lne s necessary because we assume the consensus specfcaton n [1], and thus, for each consensus k that decded before the crash, we need to propose the same value upon recovery to have the guarantee that consensus k decdes agan after the recovery. Lne 14 would not be necessary f consensus was also specfed wth the commt prmtve. Fnally, lne 19 dffers from [14]: t s ncorrect n the optmzed algorthm n [14]. 8 The correctness argument of Algorthm 2 s smlar to [14]. 4.3 Non-unform atomc broadcast Informally, the dfference between the unform and non-unform atomc broadcast algorthms s that the non-unform algorthm only needs to wrte to stable storage upon executon of commt. The unform algorthm presented n the prevous secton needs to log every value proposed to consensus (Algorthm 2, lne 37). The reason s that volatle events have to be replayed upon recovery, exactly as they occurred before the crash. The unform algorthm thus accesses the stable storage every tme an nstance of consensus s started. In contrast, the non-unform algorthm can forget volatle events at any process, whle stll fulfllng ts specfcaton. Thus, f a process crashes and recovers, t only needs to remember ts state at the tme of the last commt. Applcatons that can afford losng uncommtted parts of the executon can typcally beneft from non-unform atomc broadcast. Note that the total order and agreement propertes do hold at good processes even f processes forget volatle events when crashng, so the applcaton state does not become nconsstent at those good processes. The non-unform algorthm s easly derved from Algorthm 2 by removng the code n the whte boxes (e.g., lnes 11 to 16, lne 20, etc.). Access to stable storage s extremely expensve and should be used as sparsely as possble. Thus, 7 Ths dffers from [14], where ths nformaton s logged every tme a new message s abcast, whch s less effcent. 8 If lne 19 s placed after callng replay, the latter may block forever. 14

f the applcaton does not execute commt frequently, the performance of the non-unform algorthm s hghly mproved compared to the unform algorthm. Furthermore, f the underlyng consensus algorthm does not access the stable storage too frequently, the performance of non-unform atomc broadcast can even be close to a crash-stop atomc broadcast algorthm. We dscuss ths ssue n the next secton. 4.4 Whch consensus algorthm should be used? Both atomc broadcast algorthms presented n the prevous secton requre an algorthm solvng consensus n the crash-recovery model. Agulera et al propose two such algorthms [1]: one of them accesses stable storage, whereas the other does not. Consensus wth access to stable storage. The consensus algorthm wth access to stable storage s well suted for unform atomc broadcast. It solves consensus as long as a majorty of processes are good, but accesses the stable storage very often: (1) every tme the state changes locally, and (2) when the process decdes. The stable storage s thus accessed at least twce per consensus. Ths does not mpact performance of unform atomc broadcast as much as one could thnk, snce the unform atomc broadcast tself logs ts proposed value at the begnnng of every consensus. However, usng ths algorthm wth our non-unform atomc broadcast s overkll: t rentroduces frequent access to stable storage that we managed to suppress wth our non-unform algorthm,.e., performance of the non-unform atomc broadcast becomes poor: the performance of non-unform atomc broadcast algorthm s almost the same as the performance of the unform atomc broadcast algorthm. Consensus wthout access to stable storage. Agulera et al show that consensus can also be solved n the crash-recovery model wthout accessng stable storage. Ths consensus algorthm suts our nonunform atomc broadcast n the sense that t does not reduce performance, as t does not access stable storage. Wth ths soluton, we acheve our goal of avodng access to stable storage as long as commt s not executed. However, the algorthm requres the number of always-up processes to be larger than the number of bad processes [1]. Ths s not a bg constrant from a practcal pont of vew. Indeed, by havng commt log part of the state of the consensus algorthm, the always-up processes are only requred to stay up between two consecutve commts. 15

5 Performance evaluaton We have mplemented dfferent atomc broadcast algorthms to compare ther performance for varous group szes: n =3and n =7. 9 The algorthms mplemented are: (a) the optmzed unform atomc broadcast algorthm proposed by Raynal and Rodrgues [14], (b) the unform atomc broadcast algorthm of Secton 4, (c) the non-unform atomc broadcast algorthm of Secton 4, and (d) a well-known unform atomc broadcast algorthm n the crash-stop model [5]. All these algorthms reduce atomc broadcast to a sequence of consensus: algorthms (a) and (b) use a crash-recovery consensus algorthm that accesses stable storage (see Secton 4.4), algorthm (c) uses a crash-recovery consensus that does not access stable storage (see Secton 4.4), algorthm (d) uses a crash-stop consensus algorthm [5]. All algorthms were mplemented n Java and follow the conventons of our Fortka framework [12]. These conventons allow protocol composton wth dfferent composton frameworks. We used the Cactus [3, 9] framework for these experments. Algorthms (a), (b) and (c) use the same lbrares for stable storage and for far-lossy channels. Algorthm (d) uses TCP-based relable channels. The hardware used for the measurements was (1) a 100 Base-TX Ethernet, wth no thrd-party traffc, (2) seven PCs runnng Red Hat Lnux 7.2 (kernel verson 2.4.18-19). The PCs have a Pentum III 766Mhz processor, 128 MB of RAM, and a 40 GB (Maxtor 6L040J2) hard dsk drve. The Java Vrtual Machne was Sun s JDK 1.4.0. In our experments, the frst process n the group 10 steadly abcasts 128-byte-long messages. 11 The offered load was constant at 100 messages per second (.e., the benchmark tres to abcast 100 messages per second, but protocol flow control wll block t from tme to tme). The actual throughput was less than that, snce the sendng thread s blocked when there are too many messages n the local set Unord (Algorthm 2, lne 23). Besdes, all processes execute commt every t seconds, where t ranged from 100 mllseconds to 5 seconds. In each experment, we measured the average early latency of messages after the executon became statonary. The early latency for message m s the tme elapsed between the abcast of m and frst adelvery of m [15]. We also measured the average throughput, defned as the number of messages adelvered per second. Note that the experments where performed wth no crashes and no false crash suspcons. The man goal of these experments was to see how the performance s affected as the frequency of commts ncreases. 12 9 We dd the same tests for n =5and the results are n-between. 10 The process wth the smallest d. 11 We have also done the same experments wth multple senders; yeldng smlar results. 12 In [14], a checkpont s taken nstead of a commt, whch s the same n terms of the mplementaton. 16

The early latency results are shown n Fgure 4, wth the 95% confdence nterval. As expected, Rodrgues-Raynal and our unform algorthm perform smlarly snce both of them use the stable storage for every consensus. An mportant observaton s that the non-unform algorthm performs much better than the two unform algorthms when commts are not frequent, snce t only accesses the stable storage when executng commt. The performance of the non-unform algorthm can even compete wth the crash-stop atomc broadcast algorthm (whch does not access stable storage at all). As the commt perod reduces, the performance of all crash-recovery algorthms, ncludng the nonunform algorthm, degrades asymptotcally, snce access to stable storage becomes more and more frequent. load = 100 msgs/s; message sze = 128 bytes; n=3 load = 100 msgs/s; message sze = 128 bytes; n=7 average early latency (msec) 600 500 400 300 200 100 Crash stop RodrguesRaynal Unform NonUnform average early latency (msec) 600 500 400 300 200 100 Crasho stop RodrguesRaynal Unform NonUnform 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 tme between two commts (sec) 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 tme between two commts (sec) (a) group sze: 3 processes (b) group sze: 7 processes Fgure 4: Early latency of varous atomc broadcast algorthms average early latency (msec) 350 300 250 200 150 100 50 load = 100 msgs/s; message sze = 128 bytes; n=3 Crash stop RodrguesRaynal Unform NonUnform average early latency (msec) 350 300 250 200 150 100 50 load = 100 msgs/s; message sze = 128 bytes; n=7 Crash stop RodrguesRaynal Unform NonUnform 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 tme between two commts (sec) 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 tme between two commts (sec) (a) group sze: 3 processes (b) group sze: 7 processes Fgure 5: 1 / throughput of varous atomc broadcast algorthms The throughput results are shown n Fgure 5, also wth the 95% confdence nterval. Actually, 17

n order to compare the curves of Fgures 4 and 5, we have plotted the values of 1 / throughput n Fgure 5. We can observe that the results n the two fgures are very smlar. latency (msec) commt perod = 50 sec; load = 100 msgs/sec; sze = 128 bytes; n = 5 500 450 400 350 300 250 p1 p2 p3 p4 p5 200 0 20 40 60 80 100 tme at whch abcast s executed (sec) Fgure 6: Latency n a sngle experment wth the non-unform atomc broadcast algorthm. Fgures 4 and 5 do not allow us to drectly spot the mpact of commt on the latency. Ths can be seen n Fgure 6. The fgure corresponds to one sngle experment wth the non-unform atomc broadcast for a group of sze n =5. The fgure shows the latency, once the steady state s reached, as a functon of the tme at whch the abcast s ssued. The fgure shows all latences, not only the early latency: f the abcast(m) s ssued at tme t and m s adelvered at p 1 at tme t + 1,atp 2 at tme t + 2 at p 2, etc., we plot fve dots wth coordnates (t, 1 ), (t, 2 ),...,(t, 5 ). In Fgure 6, commt was executed approxmately at t =50and t = 100. We can clearly observe how latency s affected by the executon of commt. The hgh latences around t =50and t = 100 come from messages that were already abcast but not yet adelvered when the commt operaton started: the latency of these messages was affected by the commt operaton. 6 Concluson We have proposed two novel specfcatons of atomc broadcast n the crash-recovery model, for unform and non-unform atomc broadcast. The key pont n these two specfcatons s the dstncton between permanent and volatle events. Ths dstncton allows us to properly defne the concept of non-unformty n the crash-recovery model. Despte some attempts n the lterature [1, 4], the concept of non-unformty n the crash-recovery model dd not have so far a satsfactory defnton. We have also ponted out the problem of process recovery after a crash, where the applcaton state needs to 18

be consstent wth the state of the atomc broadcast algorthm. We have shown how ths problem can be solved. It s mportant to understand that ths consstency problem does not arse n the crash-stop model, whch explans that t was overlooked up to now. Fnally, we have run experments to compare the performance of the two new atomc broadcast algorthms wth two publshed algorthms, one based on the crash-stop model, the other based on the crash-recovery model. The specfcatons and algorthms gven here are for statc groups,.e., for groups wthout membershp change. In the future we plan to extend ths work to dynamc groups, where processes can be added to and removed from the group durng the computaton. Dynamc groups have been consdered n the crash-stop model, but not n the crash-recovery model. References [1] Marcos Kawazoe Agulera, We Chen, and Sam Toueg. Falure detecton and consensus n the crash-recovery model. Dstrbuted Computng, 13(2):99 125, 2000. [2] Y. Amr, C. Danlov, M. Mskn-Amr, J. Stanton, and C. Tutu. On the performance of wde area synchronous database replcaton. Techncal report, CNDS-2003-3. John Hopkns Unv., December 2003. [3] N. T. Bhatt, M. A. Hltunen, and D. Schlchtng. Coyote: A system for constructng fnegran confgurable communcaton servces. ACM Trans. on Computer Systems, 16(4):321 366, November 1998. [4] R. Bochat and R. Guerraou. Relable broadcast n the crash-recovery model. In Proc. of 19th IEEE Symposum on Relable Dstrbuted Systems (SRDS 00), Nuremberg, Germany, oct 2000. [5] T. D. Chandra and S. Toueg. Unrelable falure detectors for relable dstrbuted systems. Journal of ACM, 43(2):225 267, March 1996. [6] X. Défago, A. Schper, and P. Urbán. Total order broadcast and multcast algorthms: Taxonomy and survey. ACM Computng Surveys, 36(4):372 421, 2005. [7] E. N. (Mootaz) Elnozahy, Lorenzo Alvs, Y-Mn Wang, and Davd B. Johnson. A survey of rollback-recovery protocols n message-passng systems. ACM Comput. Surv., 34(3):375 408, 2002. 19

[8] V. Hadzlacos and S. Toueg. A modular approach to fault-tolerant broadcasts and related problems. TR 94-1425, Dept. of Computer Scence, Cornell Unversty, Ithaca, NY, USA, May 1994. [9] M.A. Hltunen and R.D. Schlchtng. A confgurable membershp servce. IEEE Transactons on Computers, 47(5):573 586, 1998. [10] M. Hurfn, A. Mostéfaou, and M. Raynal. Consensus n asynchronous systems where processes can crash and recover. In Proceedngs of the 17th Symposum on Relable Dstrbuted Systems (SRDS), pages 280 286, West Lafayette, IN, USA, October 1998. [11] B. Kemme. Database Replcaton for Clusters of Workstatons. PhD thess, Swss Federal Insttute of Technology Zürch, Swtzerland, August 2000. No. 13864. [12] S. Mena, X. Cuveller, C. Grégore, and A. Schper. Appa vs. Cactus, comparng protocol composton frameworks. In Proc. of 22th IEEE Symposum on Relable Dstrbuted Systems (SRDS 03), Florence, Italy, October 2003. [13] R. Olvera, R. Guerraou, and A. Schper. Consensus n the crash-recover model. Techncal Report 97/239, École Polytechnque Fédérale de Lausanne, Swtzerland, August 1997. [14] L. Rodrgues and M. Raynal. Atomc broadcast n asynchronous crash-recovery dstrbuted systems and ts use n quorum-based replcaton. IEEE Transactons on Knowledge and Data Engneerng, 15(4), 2003. [15] Péter Urbán. Evaluatng the Performance of Dstrbuted Agreement Algorthms: Tools, Methodology and Case Studes. PhD thess, École Polytechnque Fédérale de Lausanne, Swtzerland, August 2003. Number 2824. [16] M. Wesmann. Group Communcatons and Database Replcaton: Technques, Issues and Performance. PhD thess, École Polytechnque Fédérale de Lausanne, Swtzerland, May 2002. Number 2577. [17] M. Wesmann and A. Schper. Beyond 1-safety and 2-safety for replcated databases: Groupsafety. In Proceedngs of the 9th Internatonal Conference on Extendng Database Technology (EDBT2004), Heraklon - Crete - Greece, March 2004. 20