Minimal Obstructions for the Coordinated Attack Problem and Beyond Tristan Fevat 1 and Emmanuel Godard 1,2 1 Laboratoire d Informatique Fondamentale CNRS (UMR 6166) Aix-Marseille Université 2 Pacific Institute for the Mathematical Sciences CNRS (UMI 3069) (tristan.fevat emmanuel.godard)@lif.univ-mrs.fr Abstract We consider the well known Coordinated Attack Problem, where two generals have to decide on a common attack, when their messengers can be captured by the enemy. Informally, this problem represents the difficulties to agree in the present of communication faults. We consider here only omission faults (loss of message), but contrary to previous studies, we do not to restrict the way messages can be lost, ie. we use no specific failure metric. Our contribution is threefold. First, we introduce the study of arbitrary patterns of failure ( omission schemes ), proposing notions and notations that revealed very convenient to handle. In the large subclass of omission schemes where the double simultaneous omission can never happen, we characterize which one are obstructions for the Coordinated Attack Problem. We present then some interesting applications. We show for the first time that the well studied omission scheme (where at most one message can be lost at each round) is a kind of least worst case environment for the Coordinated Attack Problem. We also extend our study to networks of arbitrary size. In particular, we solve an open question of Santoro and Widmayer about the Consensus Problem in communication networks with omission faults. Keywords: Distributed Systems, Fault-Tolerance, Message- Passing, Synchronous Systems, Coordinated Attack Problem, Two Generals Problem, Two Armies Problem, Consensus, Communication Schemes, Omission Faults A. Motivations I. INTRODUCTION The Coordinated Attack Problem (also known in the literature as the two generals or the two armies problem) is a long time problem in the area of distributed computing. It is a fictitious situation where two armies have to agree to attack or not on a common enemy that is between them and might capture any of their messengers. Informally, it represents the difficulties to agree in the present of communication faults. The design of a solution is difficult, sometimes impossible, as it has to address possibly an infinity of lost mutual acknowledgments. It has important applications for the Distributed Databases commit for two processes, see [Gra78]. It was one of the first impossibility results in the area of fault tolerance and distributed computing [AEH75], [Gra78]. In the vocabulary of more recent years, this problem can now be stated as the Uniform Consensus Problem for 2 synchronous processes communicating by message passing in the presence of omission faults. It is then a simple instance of a problem that had been very widely studied [AT99], [CBGS00], [MR98]. See for example [Ray02] for a recent survey about Consensus on synchronous systems with some emphasis on the omissions fault model. Moreover, given that, if any message can be lost, the impossibility of reaching an agreement is obvious, one can wonder why it has such importance to get a name on its own, and maybe why it has been studied in the first place... We will discuss that later on the Related Works section. The idea is that one has usually to restrict the way the messages are lost in order to keep this problem relevant. We call an arbitrary pattern of failure by loss of messages an omission scheme. It will formally describe the fault environment in which the system evolves. For example, the omission scheme where any message can be lost at any round except that all messages cannot be lost indefinitely is a special omission scheme (any possibility of failure except one scenario), for which it is still impossible to solve the Coordinated Attack Problem, but the proof might be less trivial. Given an omission scheme, a natural question is whether the Coordinated Attack Problem is solvable against this environment. More generally, the question that arises now is to describe what are exactly the omission schemes for which the Coordinated Attack Problem admit a solution, and for which ones is there no solution. These later omission schemes will be called obstructions. On the way to a thorough characterization of these obstructions, we address this problem for a particular subclass of failure patterns, namely the ones when no two messages can be lost at the same round. It is long known the Coordinated Attack Problem is unsolvable if one message might be lost at each round [CHLT00], [GKP03]. We also show how to apply the results to synchronous communication networks of arbitrary size and topology. B. Related Works The Coordinated Attack Problem is a kind of folklore problem for distributed systems. It seems to appear first in [AEH75] where it is a problem of gangsters plotting for a big job. It is usually attributed to [Gra78], where Jim Gray coined the name Two Generals Paradox and put the emphasis on the infinite recursive need for acknowledgments in the impossibility proof. In textbooks it is often given as an example, however the drastic conditions under which this impossibility result yields
are never really discussed, even though for relevancy purpose they are often slightly modified. In [Lyn96], a different problem of Consensus (with a weaker validity condition) is used. In [San06], such a possibility of eternal loss is explicitly ruled out as it would give a trivial impossibility proof otherwise. This shows that the way the messages may be lost is an important part of the problem definition, hence it is interesting to characterize when the pattern of loss allows to solve the consensus problem or not, ie. whether the fault environment is an obstruction for the Coordinated Attack Problem or not. To our knowledge, this is the first time, this problem is investigated for arbitrary patterns of omission failures, even in the simple case of only two processes. Most notably, it has been addressed for an arbitrary number of processes and for special (quite regular) patterns in [CHLT00], [GKP03], [Ray02]. C. Scope of Application and Contributions Impossibility results in distributed computing are the more interesting when they are tight, ie. they give an exact characterization of when a problem is solvable and when it is not. There are a lot of results regarding the Distributed Consensus problem when the underlying network is a complete graph (namely in the context of Shared Memory Systems), see for example the works in [HS99], [SZ00], [BG93] where an exact topologically-based characterizations is given. There are also more recent results when the underlying network can be any arbitrary graph. The results given in [SW07] by Santoro and Widmayer are almost tight. What is worth to note is that the general theorems could not be directly used by this study for the very reason that the failure model for communication networks is not interestingly expressible in the fault model for systems with one to one communication. See Section II-C for a more detailed discussion. On the way to find exact and general characterizations, we propose new insights in the very simple setting of two synchronous processes. Although this is indeed a complete graph, the extension to non-complete graphs will prove to be rather interesting, as will be seen in Section V where we show, in particular, that the Consensus Problem is solvable on a synchronous communication network G where at most f losses of messages per round can happen if and only if f < c(g), where c(g) is the connectivity of the graph G. We underline that the omission failures we are studying here encompass networks with crash failures, see Example II.10. It should also be clear that omission schemes can also be studied in the context of a problem that is not the Consensus Problem. Moreover, as we do not endorse any pattern of failures as being, say, more realistic, our technique can be applied for any new patterns of failures. Our contribution is threefold. First, we introduce the study of arbitrary patterns of failure ( omission schemes ), proposing notions and notations that revealed very convenient to handle, with a very broad scope of applications. In particular, standard failure patterns are very simply expressed in this framework. In the large subclass of omission schemes where the double simultaneous omission can never happen, we characterize which one are obstructions for the Coordinated Attack Problem. Finally, giving tight results in our context means describing the obstructions that are minimal, the ones for which restricting only one possible scenario turns the problem from unsolvable to solvable. Thinking about our previous discussion about the trivial impossibility proofs, and therefore the need to address the problem for interesting fault environments, intuitively, it implies that the minimal obstructions are the one for which the impossibility proof is the most challenging, hence they are the more relevant environments. Our result shows for the first time that the well studied omission scheme where at most one message can be lost at each round is merely a minimal obstruction for the Coordinated Attack Problem. And thirdly, we extend our study to arbitrary graphs deriving new results, like an answer to an open problem of [SW07]. The outline of the paper is the following. We describe Models and define our Problem in the Section II. We present numerous examples of application of our terminology and notation in Section II-E. We then address the characterization of omission schemes without simultaneous faults that are obstructions for the Coordinated Attack Problem in Theorem III.8. We give the proofs for necessary condition (impossibility result) in III-C and for sufficient condition (explicit algorithm) in III-D. Applications are proposed on Section IV. An extension of both the notations and some results for arbitrary graphs are given in Section V. II. MODELS AND DEFINITIONS A. The Coordinated Attack Problem 1) A folklore problem: Two generals have gathered forces on top of two facing hills. In between, in the valley, their common enemy is entrenched. Every day each general sends a messenger to the other through the valley. However this is risky as the enemy may capture them. Now they need to get the last piece of information: are they both ready to attack? This two army problem was originated by [AEH75] and then by Gray [Gra78] when modeling the distributed database commit. It corresponds to the binary consensus with two processes in the omission model. If their is no restriction on the fault environments, then any messenger may be captured. And if any messenger may be captured, then consensus is obviously impossible: the enemy can succeed in capturing all of them, and without communication, no distributed algorithm. 2) Possible Environments: Before trying to address what can be, in some sense, the most relevant environments, we will describe different environments in which the enemy cannot be so powerful as to be able to capture any messenger. This is a list of possible environments, using the same military analogy. The generals name are White and Black. 1) no messenger is captured 2) messengers from General White may be captured 3) messengers from General Black may be captured 2
4) messengers from one general are at risk, and if one of them is captured, all the following will also be captured (the enemy got the secret Code of Operations for this general from the first captured messenger) 5) messengers from one general are at risk (the enemy could manage to infiltrate a spy in one of the armies) 6) at most one messenger may be captured each day (the enemy can t closely watch both armies on the same day) 7) any messenger may be captured Which ones are (trivial) obstructions, and which are not? Nor obstruction, nor trivial? What about more complicated environments? B. The Binary Consensus Problem A set of synchronous processes wish to agree about a binary value. This problem was first identified and formalized by Lamport, Shostak and Pease [PSL80]. Given a set of processes, a consensus protocol must satisfy the following properties for any combination of initial values [Lyn96]: Termination: every process decides some value. Validity: if all processes initially propose the same value v, then every process decides v. Agreement: if a process decides v, then every process decides v. Consensus with such a termination and decision requirement for every process is more precisely referred to as the Uniform Consensus, see [Ray02] for example for a discussion. Given a fault environment, the natural questions are : is the Consensus solvable, if it is solvable, what is the minimal complexity? C. The Communication Model In the first part of the paper, the system we consider is a set Π of only 2 processes named white and black, Π = {, }. The processes evolve in synchronized rounds. In each round r, every process x executes the following steps: x sends a message to the other process, receives a messages M from the other process, and then updates its state according to the received message. The messages sent are, or are not, delivered according to the environment in which the distributed computation takes place. This environment is described by a omission scheme. Such a communication scheme models exactly how some messages sent by a process to the other process may be lost at some rounds. We will consider arbitrary omission schemes, they will be represented as an arbitrary set of infinite sequences of combinations of messages being possibly transmitted or not. In the following of [CBS09], we are not interested in the exact cause of a message not being sent. We only refer to the phenomenon: messages being transmitted or not. The fact that the schemes are arbitrary means that we do not endorse any metric to count the number of failures in the system. There are metrics that count the number of lost messages, that count the number of process that can lost messages (both in send and receive actions). Other metrics count the same parameters but only during a round of the system. The inconvenient of this metric-centric approach is that, even when restricted only to omission faults, it can happen that some results obtained on complete networks are not usable on arbitrary networks, because in, say, a ring network, you are stating in some sense that every node is faulty. See eg [SW07] for a very similar discussion, where Santoro and Widmayer, trying to solve some generalization of agreement problems in general networks could not use directly the known results in complete networks. As said in the introduction, the Coordinated Attack Problem is nowadays stated as the Uniform Consensus Problem for 2 synchronous processes communicating by message passing in the presence of omission faults. Nonetheless, we will use Consensus and Uniform Consensus interchangeably in this paper. We emphasize that, because we do not assign omission faults to any process (see previous discussion), there are only correct processes. D. Omission Schemes We introduce and present here our notation. In Section V-A this is naturally and, we shall say, interestingly extended to graphs of arbitrary size and topology. Definition II.1. We denote by Σ the set of directed graphs with vertices in Π. Σ = {,,, } We denote by Γ the following subset of Σ Γ = {,, }. Those elements describe what can happen at a given round with the following obvious semantics:, no process looses messages, the message of process, if any, is not transmitted, the message of process, if any, is not transmitted, both messages, if any, are not transmitted 1. Definition II.2. An omission scheme over Π is a set of infinite sequences of elements of Σ. We will use the standard following notations in order to describe more easily our communication schemes [PP04]. A (infinite) sequence is seen as a (infinite) word over the alphabet Σ. The empty word is denoted by ε. Definition II.3. Given Φ Σ, Φ is the set of all finite sequences of elements of Φ, Φ ω is the set of all infinite ones and Φ = Φ Φ ω. A word in L Σ ω is called a communication scenario ( or scenario for short) of omission scheme L. Given a word w Σ, it is called a partial scenario and w is the length of this word. Intuitively, the letters at position r of the word describe whether there will be, or not, transmission of the message, if one is being sent at round r. A formal definition of an execution under a scenario will be given in Section II-F. 1 In order to increase the readability, we note instead of, the double-omission case. 3
We recall now the definition of the prefix of words and languages. A word u Σ is a prefix for w Σ (resp. w Σ ω ) if there exists v Σ (resp. v Σ ω ) such that w = uv (resp. w = uv ). Given w Σ ω and r N, w r is the prefix of size r of w. Definition II.4. Let w Γ, then P ref(w) = {u Γ u is a prefix of w}. Let L Γ, then P ref(l) = P ref(w). w L Remark. We do not restrict our study to regular sets, however all communication schemes we are aware of are regular, as could be seen in the following examples, where the rational expressions prove to be very convenient. E. Examples We show how standard fault environments are conveniently described in our framework. Example II.5. Consider a system where, at each round, up to 2 messages can be lost. The associated omission scheme is Σ ω. Example II.6. Consider a system where, at each round, only one message can be lost. The associated omission scheme is Γ ω. Example II.7. Consider a system where, at each round, only one message can be lost. The associated omission scheme is {,, lnoir} ω. Example II.8. A communication system is fair if given an infinity of sent messages, an infinity is also received. The associated omission scheme is the following: F = Σ ω \Σ ({, } ω {, } ω ). Example II.9. Consider a system where at most one of the processes can loose messages. The associated scheme is the following: S 1 = {, } ω {, } ω Example II.10. Consider a system where at most one of the processes can crash, For the phenomena point of view, this is equivalent to the fact that at some point, no message from a particular process will be transmitted. The associated scheme is the following: C 1 = { } ω { } ({ } ω { } ω ) Example II.11. Finally the seven simple cases exposed in Section II-A2 are described as follows: S 0 = { } ω (1) T = {, } ω (2) T = {, } ω (3) C 1 = { } ω { } { } ω { } { } ω (4) S 1 = {, } ω {, } ω = T T (5) R 1 = {,, } ω (6) S 2 = Σ ω (7) F. Execution of a Distributed Algorithm Given an omission scheme L, we define what is a successful run of a given algorithm A. An execution, or run, of an algorithm A under scenario w L is the following. At round r N, messages are sent (or not) by the processes. The fact that the corresponding receive action will be successful depends of a, the r-th letter of w. if a =, then any eventual message is correctly delivered, if a =, then the message of process is not transmitted (the receive call of, if any at this round, returns null), if a =, then the message of process is not transmitted (the receive call of, if any at this round, returns null), if a =, both eventual messages are not transmitted. Then, both processes updates their state according to A and the value received. Given u P ref(w), we denote by s x (u) the state of process x at the u -th round of the algorithm A under scenario w. This means in particular that s x (ε) represents the initial state of x. An execution is a (possibly infinite) sequence of such messages exchanges and corresponding states. Finally and classically, Definition II.12. A algorithm A solves the Coordinated Attacked Problem for the omission scheme L if for any scenario w L, there exists u P ref(w) such that the states of the two processes (s (u) and s (u)) satisfy the three conditions of Section II-B. Definition II.13. An omission scheme L is said to be solvable if there exists an algorithm that solves the Coordinated Attacked Problem for L. It is said to be an obstruction otherwise. An omission scheme L is a (inclusion) minimal obstruction if any L L is solvable. Remark. We emphasize that for an algorithm, knowing the omission scheme against which it runs is not tantamount to know whether a given message is actually received or not. III. CHARACTERIZATION OF SOLVABLE SCHEMES WITHOUT DOUBLE OMISSION In this part, we consider the schemes without double omission, that is the schemes L Γ ω. We characterize which one 4
are solvable or not. Where the consensus is achievable, we also give an effective algorithm that will be proved to be optimal in some particular cases. A. Index of a Scenario We will use the following integer function of scenarios that, in some sense, encodes all the property the omission schemes. By induction, we define the following integer index given w Γ. First, we define δ on Γ by δ( ) = 1, δ( ) = 0. δ( ) = 1, Definition III.1. Let w Γ. We define ind(ε) = 0. If w 1, then we have w = ua where u Γ and a Γ. In this case, we define ind(w) := 3ind(u) + ( 1) ind(u) δ(a) + 1. Lemma III.2. Let r N. The application ind is a bijection from Γ r to 0, 3 r 1. Proof: The lemma is proved by a simple induction. If r = 0, then the property holds. Let r N. Suppose the property is satisfied for r 1. Given w Γ r, we have w = ua with u Γ r 1 and a Γ. From ind(w) = 3ind(u) + ( 1) ind(u) δ(a) + 1 and the induction hypothesis, we get immediately that 0 ind(w) 3 r 1. Now, we need only to prove injectivity. Suppose there are w, w Γ r such that ind(w) = ind(w ). So there are u, u Γ r 1 and a, a Γ such that 3ind(u)+( 1) ind(u) δ(a)+1 = 3ind(u )+( 1) ind(u ) δ(a )+1. Then 3(ind(u) ind(u )) = ( 1) ind(u ) δ(a ) ( 1) ind(u) δ(a). Remarking that the right hand side of this integer equality has an absolute value that can be at most 2, we finally get ind(u) = ind(u ) ( 1) ind(u) δ(a) = ( 1) ind(u ) δ(a ) By induction hypothesis, we get that u = u and a = a. Hence w = w, and ind is injective, therefore bijective of Γ r onto 0, 3 r 1. Two easy calculations give Proposition III.3. Let r N, ind( r ) = 0 and ind( r ) = 3 r 1. In Figure 1, the indexes for words of length at most 2 are given. We now describe precisely what are the words whose indexes differs by only 1. Lemma III.4. Let r N, and v, v Γ r. Then ind(v ) = ind(v)+1 if and only if one of the following conditions holds: III.4.i ind(v) is even and either there exists u Γ r 1, and v = u, v = u, word of length 1 index 0 1 2 word of length 2 index 0 1 2 word of length 2 index 5 4 3 word of length 2 index 6 7 8 Fig. 1. Indexes for some short words either there exists u, u Γ r 1, and v = u, v = u, and ind(u ) = ind(u) + 1. III.4.ii ind(v) is odd and either there exists u Γ r 1, and v = u, v = u, either there exists u, u Γ r 1, and v = u, v = u, and ind(u ) = ind(u) + 1. Proof: The lemma is proved by a induction. If r = 0, then the property holds. Let r N. Suppose the property is satisfied for r 1. Suppose there are w, w Γ r such that ind(w ) = ind(w)+ 1. So there are u, u Γ r 1 and a, a Γ such that 3ind(u)+( 1) ind(u) δ(a)+2 = 3ind(u )+( 1) ind(u ) δ(a )+1. Then 3(ind(u) ind(u )) = ( 1) ind(u ) δ(a ) ( 1) ind(u) δ(a) 1. Remarking again that this is an integer equality, we then have either (ind(u) = ind(u ) and ( 1) ind(u ) δ(a ) = ( 1) ind(u) δ(a) + 1, either (ind(u ) = ind(u) + 1 and ( 1) ind(u ) δ(a ) = ( 1) ind(u) δ(a) 2. This yields the following cases: 1) ind(u) = ind(u ) is even, a = and a =, 2) ind(u) = ind(u ) is odd, a = and a =, 3) ind(u) = ind(u ) is even, a = and a =, 4) ind(u) = ind(u ) is odd, a = and a =, 5) ind(u ) = ind(u) + 1, ind(u) is even, a = = a, 6) ind(u ) = ind(u) + 1, ind(u) is odd, a = = a, Getting all the pieces together, we get the results of the lemma. Given an algorithm A, we have this fundamental corollary: Corollary III.5. Let v, v Γ r such that ind(v ) = ind(v) + 1. Then, III.5.i if ind(v) is even then s (v) = s (v ), III.5.ii if ind(v) is odd then s (v) = s (v ). Proof: We prove the result using Lemma III.4 and remarking that either a process receives a message from the other process being in the same state in the preceding configuration u; either it receives no message when the state of the other process actually differ. 5
B. A Characterization We prove that an omission scheme L Γ ω is solvable if and only if it does not contain a fair scenario or a special pair of unfair scenarios. We define the following set to help describe the special unfair pairs. We recall the definition of unfair scenarios. Definition III.6. A scenario w Σ ω is unfair if w Σ ({, } ω {, } ω ). The set of fair scenarios of Γ ω is denoted by F air(γ ω ). Definition III.7. We define special pairs SP air(γ ω ) = {(w, w ) Γ ω Γ ω w w, r N ind(w r ) ind(w r ) 1}. Theorem III.8. Let L Γ ω, then Consensus is solvable for omission scheme L if and only if one of the following holds III.8.i f F air(γ ω ), f / L, III.8.ii (u, u ) SP air(γ ω ), u, u / L, III.8.iii ω / L, III.8.iv ω / L. We present the proof of the Theorem III.8 in the following sections. C. The Necessary Condition: an Impossibility Proof We will use a standard bivalency proof technique. We suppose now on that there is an algorithm A to solve Consensus on L. We proceed by contradiction. So we suppose that all the conditions of Theorem III.8 are not true, ie that F air(γ ω ) { ω, ω } L. And that for all (w, w ) SP air(γ ω ), w / L = w L. Definition III.9. Given an initial configuration, let v P ref(l) and i {0, 1}. The partial scenario v is said to be i-valent if, for all scenario w L such that v P ref(w), A decides i at the end. If v is not 0 valent nor 1 valent, then it is said bivalent. By hypothesis, ω, ω L. Consider our algorithm A with input 0 on both processes running under scenario ω. The algorithm terminates and has to output 0 by Validity Property. Similarly, both processes output 1 under scenario ω with input 1 on both processes. From now on, we have this initial configuration I: 0 on process and 1 on process. For there is no difference under scenario ω for this initial configurations and the previous one. Hence, A will output 0 for scenario ω. Similarly, for there is no difference under scenario ω for both considered initial configurations. Hence, A will output 1 for this other scenario. Hence ε is bivalent for initial configuration I. In the following, valency will always be implicitly defined with respect to the initial configuration I. Definition III.10. Let v P ref(l), v is decisive if III.10.i v is bivalent, III.10.ii For any a Γ such that va P ref(l), va is not bivalent. Lemma III.11. There exists a decisive v P ref(l). Proof: We suppose this not true. Then, as ε is bivalent, it is possible to construct w Γ ω such that, P ref(w) P ref(l) and for any v P ref(w), v is bivalent. Bivalency for a given v means in particular that the algorithm A has not stopped yet. Therefore w / L. This means that w is unfair. Then this means that, w.l.o.g we have u Γ +, such that w = u ω and ind(u) is even. We denote by w = ind 1 (ind(u) 1) ω. The couple (w, w ) is a special pair. Therefore w belongs to L, so A halts at some round r 0 under scenario w. By Corollary III.5, s (w r0 ) = s (w r 0 ) This means that w r0 is not bivalent. As w r0 P ref(l) this gives a contradiction. We can now end the impossibility proof. Proof: Consider a decisive v P ref(l). By definition, this means that there exists a, b Γ, with a b, such that va, vb P ref(l) and va and vb are not bivalent and are of different valencies. Obviously, we can choose b to be. We terminate by a case by case analysis. Suppose that a =, and ind(v) is even. By Corollary III.5, this means is in the same state after va and vb. We consider the scenarios va ω and vb ω, they forms a special pair. So one belongs to L by hypothesis, therefore both processes should halt at some point under this scenario. However, the state of is always the same under the two scenarios so if it halts and decides some value, we get a contradiction with the different valencies. Other cases are treated similarly. D. A Consensus Algorithm Given a word w in Γ ω, we define the following algorithm A w (see Algorithm 1). It has messages always of the same type. They have two components, the first one is the initial bit, named init. The second is an integer named ind. Given a message msg, we note msg.init (resp. msg.ind) the first (resp. the second) component of the message. We prove that the index computed in the algorithm are almost always equal to the index of the scenario.more precisely, Proposition III.12. For any round r of an execution of Algorithm A w under scenario v Γ r, such that no process has already halted, ind r ind r = 1, sign(ind r ind r) = ( 1) ind(v), ind(v) = min{ind r, ind r}. Proof: We prove the result by induction over r N. For r = 0, the equalities are satisfied. Suppose the property is true for r 1. We consider a round of the algorithm. Let u Γ r 1 and a Γ, and consider an execution under environment w = ua. There are exactly three cases to consider. 6
Algorithm 1: Consensus Algorithm A w for Process x: Data: w Γ ω Input: init {0, 1} r=0; initother=null; if x = then ind=0; ind=1; while ind ind(w r ) 1 do msg = (init,ind); send(msg); msg = receive(); if msg == null then // message was lost ind = 3 ind; ind = 2 msg.ind + ind; initother = msg.init; r=r+1; if x = then if ind ind(w r ) then Output: init Output: initother if ind ind(w r ) then Output: init Output: initother Suppose a =. Then it means both messages are received and ind r = 2ind r 1+ind r 1 and ind r = 2ind r 1+ ind r 1. Hence ind r ind r = ind r 1 ind r 1. Thus, using the recurrence property, we get ind r ind r = 1. Moreover, by construction ind(v) = ind(ua) = 3ind(u) + ( 1) ind(u) (δ(a)) + 1 = 3ind(u) + 1 The two indices ind(u) and ind(v) are therefore of opposite parity. Hence, by induction property, sign(ind r ind r) = ( 1) ind(v). And remarking that min{ind r, ind r} = min{2ind r 1 + ind r 1, 2ind r 1 + ind r 1} = ind r 1 + ind r 1 + min{ind r 1, ind r 1} = 2 min{ind r 1, ind r 1} + ind r 1 ind r 1 + min{ind r 1, ind r 1} = 3 min{ind r 1, ind r 1} + 1 we get that the third equality is also verified in round r. Consider now the case a =. Then gets no message from and gets a message from. So we have that ind r = 3ind r 1 and ind r = 2ind r 1 + ind r 1. So we have that ind r ind r = ind r 1 ind r 1. The first equality is satisfied. We also have that ind(v) = 3ind(u)+( 1) ind(u) δ( )+ 1 = 3ind(u) + α, with α = 0 if ind(u) is even and α = 2 otherwise. Hence, ind(u) and ind(v) are of the same parity, and the second equality is also satisfied. Finally, we have that min{ind r, ind r} = {3ind r 1, 2ind r 1 + ind r 1}. If ind(u) is even, then ind(u) = ind r 1 and the equality holds. If ind(u) is odd, then ind(u) = ind r 1 and the equality also holds. The case a = is a symmetric case and is proved similarly. E. Correctness of the Algorithm Given an omission scheme L, we suppose that one of the following holds. f F air(γ ω ), f / L, (u, u ) SP air(γ ω ), u, u / L, ω / L, ω / L. In particular, L Γ ω and we denote by w, a scenario in Γ ω \L. If w is unfair, we assume it is either ω or ω, or it belongs to a special pair that is not included in L. We consider the algorithm A w with parameter w as defined above. Lemma III.13. Let v L. There exists r N such that ind(v r ) ind(w r ) 2. Proof: Given w / L, at some round r, w r v r. Therefore ind(v r ) ind(w r ) 1. From Lemma III.4 and Definition III.7, it can be seen that the only way to remain indefinitely at a difference of one is exactly that w and v form a special pair. Given the way we choose w, and that v L, this is is impossible. So at some point, the difference will be greater than 2. Now, we prove that under the omission scheme L, the algorithm is correct. Proof: First we show Termination. This is a corollary of Lemma III.13 and Proposition III.12. Denote by p the round when it is first terminated for one of the process. If the other process x has also terminated at round p then we are done. Otherwise, and according to Proposition III.12, this means that ind x r ind(w r ) = 1. Then, in the following round, x will receive no message (the other process has halted) and ind x r ind(w r ) 2 will hold. The validity property is also obvious, as the output values are ones of the initial values. Since the only case where initother = null is when has not received any messages from, ie. in the case where w = p. But as ind( p ) = 3 p 1 is the maximal possible index for scenario of length r, and ind(w p ) < ind ( p ), it cannot have to output initother in this case. Similarly for, this proves that null can never be output by any process. We now prove the agreement property. Given that ind r ind r = 1 by Proposition III.12, when the processes halt, the indexes are on the same side of ind(w r ). This means that one of process outputs init, the other outputting initother. By construction, they output the same value. 7
F. Tightness of the Round Complexity The time complexity (round complexity) of the algorithm is in some sense optimal. An Early Consensus Algorithm is a Consensus Algorithm that terminates early when there is less actual failures that it is possible [PRT06]. In some sense, this is what can be obtained here even if the notion of maximum of possible failures has no particular sense for us since we are not using a particular failure metric but rather considering any arbitrary failure pattern. Due to lack of space, we will not present a full discussion, but we will rather show how to obtain the optimality in one particular case. It has to be noted that, given L, the w Γ ω \L that is chosen is not necessarily the one that gives an optimal algorithm. It might depend on the very structure of L. We consider the particular case where P ref(l) Γ. We denote by p the smallest integer such that Γ p P ref(l). By hypothesis, this integer is always well defined.we denote by w 0 a word in Γ p and not in P ref(l).we prove that any algorithm solving Consensus on L needs at least p rounds. Corollary III.14. Let A be an algorithm solving Consensus on Π under scenarios L Γ ω. Then A runs at least in p rounds in the worst case. Proof: Suppose that A run in q rounds with q < p under any scenario in L. We have that P ref(l) Γ q = Γ q. Consequently, A is also a solution for Consensus on scenario Σ ω. This is a contradiction with Theorem III.8. If we note w = w 0 ω, then it proves that the bound is tight since Proposition III.15. Modifying the algorithm A w by adding to the while condition or r = p, we get an algorithm that halts in at most p rounds under omission scheme L. Proof: Considering the proof of the correctness of algorithm A w, it shows that the only problem is when ind x = ind(w r ). But given that w 0 is not a possibility, we can correctly conclude at this round p. IV. APPLICATIONS A. Back to the Coordinated Attack Problem We consider now our question on the seven examples of Example II.11. The answer to possibility is obvious for the first and last cases. In the first three cases consensus can be reached in one day provided that the generals know in which scenario they are. The fourth and fifth cases are a bit more difficult but within reach of our Theorem III.8. We remark that the scenario ω is a fair scenario that does not belong to C 1, nor S 1. Therefore, those are solvable cases also. Note that the fourth and first case correspond respectively to the crash-prone model [Lyn96] and to the 1 resilient model [Ada03], [GKP03]. In the last one [Gra78], [Lyn96] consensus can t be achieved, as said before. Now, using Proposition III.15 for the five first of the seven cases of Example II.11 yields the following. 1) S 0 is solvable in 1 round, 2) T is solvable in 1 round, 3) T is solvable in 1 round, 4) C 1 is solvable in exactly 2 rounds, 5) S 1 is solvable in exactly 2 rounds. B. Application to Almost Fair Scenarios Corollary IV.1. Let F = Γ ω \{ ω }. The scheme F is not an obstruction. The corresponding algorithm A ω solves the Consensus Problem by Theorem III.8. It is worth to note that it corresponds exactly to the following intuitive algorithm. Process sends its initial value to process until it gets a message from. Then it stops, outputting s initial value. On the other hand, process sends its initial value until it receives no message from and outputs its own initial value. This shows, that even it is a general meta-algorithm, A ω is exactly what you would propose intuitively for simple cases such as F. C. About Minimal Obstructions The Theorem III.8 shows that, even in the simpler subclass where no double omission are permitted, simple inclusionminimal schemes may not exist. Indeed, there exists a sequence of unfair scenarios (u i ) i N) such that i, j, (u i, u j ) is not a special pair. Therefore L n = Γ ω \ 0 i n u i define an infinite decreasing sequence of obstructions for the Coordinated Attack Problem. As the set SP air(γ ω ) define a bipartite graph over the unfair omissions, it is possible to prove that for any set U of unfair scenario that is a cover for this graph, we have, by Theorem III.8, that the scheme Γ ω \U is a minimal obstruction. As a partial conclusion, we shall say that the well known scheme Γ ω, even not being formally a minimal obstruction, is the nearest obstruction we have to a simple minimal obstruction... V. NETWORKS OF ARBITRARY SIZE In this section, we investigate the case of synchronous communication networks of arbitrary (connected) topology with omission failures. Such a study has been done in [SW07], along with the study of numerous other fault models. In the part about omission failures, it is proved that if at most f losses of messages per round can happen, the Consensus Problem 2 is solvable on a network G if f is strictly less than c(g), the connectivity of G, ie the minimal number of edges to remove to disconnect the graph G. If f is greater or equal to the minimum degree deg(g) of G, the Consensus Problem is proved to be impossible to solve. So for graphs with c(g) < deg(g), the question was still open when c(g) f < deg(g). Here we prove the following theorem: 2 In [SW07], k Majority Problems are actually investigated. We focus here on the results about the so-called Unanimity Problem, that is equivalent to the Consensus Problem. 8
Theorem V.1. The Consensus Problem is solvable on a synchronous communication network G where at most f losses of messages per round can happen if and only if f < c(g). A. Definitions and Notations Before presenting the proof, we show how to extend our notations from Section II-D. First, given an undirected graph G, we note G the directed version, ie the directed graph (V, E ), where E = {(u, v), (v, u) {u, v} E}. A round with omission faults is then expressed as a subgraph of G with the obvious semantic of Section II. Hence, we define the alphabet Σ G by Σ G = {(V, E ) E E }. The notation of Section II-D is naturally extended here to these (bigger) alphabets, and all possible scenarios on G are described by Σ ω G. With these notations, we can now describe the communication scheme involved in the work of Santoro and Widmayer. We note O f the following set, O f = {(V, E ) E E, Card( E \ E f}. Solving the Consensus Problem with f omissions per rounds is exactly solving the Consensus Problem for the scheme Of ω. B. Proof Now we present the proof of Theorem V.1. Proof: The sufficient part being already proved in [SW07] (using a broadcast-based algorithm), we only have to prove the impossibility part. Let G = (V, E) be a (undirected) connected graph, and let s f = c(g). By definition of the connectivity, there exists a 3-partition A, B, C of E such that 1) Card(C) = f 2) the graphs (V A, A) and (V B, B) are connected, where V X = {v V e X, v e}, 3) the graph (V, A B) is disconnected. We assume that C = {a 1, b 1,..., a f, b f } with a i A and b i B for all i. We define the following alphabet Γ C : where Γ C = {C, C, C } C = G, ( C = V, A ) B {(a 1, b 1 ),..., (a f, b f )}, ( C = V, A ) B {(b 1, a 1 ),..., (b f, a f )}. The set Γ C corresponds to the set of rounds where either no message is lost or either all messages from A to B are lost, either all messages from B to A. This is an alphabet of size 3. We will prove now that the omission scheme Γ ω C is not solvable. Since Γ ω C Oω f, this will end the proof. We prove the impossibility by reduction to the two processes case with omission scheme Γ ω. Suppose now that we have an algorithm A that solves the Consensus Problem for the scheme Γ ω C. We describe how to derive an algorithm A that solves the consensus for Γ ω. The idea is that will emulate A on sub-graph A, and will emulate it on sub-graph B. See Algorithms 2 and 3 Algorithm 2: Algorithm A for process Input: init {0, 1} H = A; Label each node of H by init; while A is not finished on H do Emulate A on H except for nodes a 1,..., a f ; Send message M = {(i, m i ), i = 1..f} where m i is defined by the message from a i to b i in this round for A, if there is not such a message Receive message N; Apply A to a 1,..., a f by extracting messages n i from N; Output: value decided by A on H. Algorithm 3: Algorithm A for process Input: init {0, 1} H = B; Label each node of H by init; while A is not finished on H do Emulate A on H except for nodes b 1,..., b f ; Send message M = {(i, m i ), i = 1..f} where m i is defined by the message from b i to a i in this round for A, if there is not such a message. Receive message N; Apply A to b 1,..., b f by extracting messages n i from N; Output: value decided by A on H. We define now a bijection ρ from Γ ω C to Γω. We first define ρ on the letters of the alphabet: ρ(c ) = ρ(c ) = ρ(c ) = The function ρ is then extended to infinite words of Γ ω C. This is a bijection by construction. It is then straightforward to check that any run of A with scenario w from Γ ω yields a valid run of A with scenario ρ 1 (w) from Γ ω C. It is then straightforward to derive the three properties of Section II-B for A from the fact there are to be satisfied by A. Thus A solves the Consensus Problem and Γ ω is solvable. 9
This is a contradiction with Theorem III.8. C. About the Minimality of Scheme Γ ω C The fact that the sub-graphs induced by A and B are actually connected can be used to prove that Γ ω C is also close to a minimal obstruction. By extension of Definition III.7, we define the C-special pairs of Γ ω C to be elements w, w Γ ω C such that (ρ(w), ρ(w )) is a special pair of Γ ω. Proposition V.2. A scheme L Γ ω C is an obstruction if and only if one of the following holds: V.2.i L = Γ ω C, V.2.ii w Γ ω C \L w such that (w, w ) is a C-special pair, w L. Proof: The sufficient condition is obtained by using Theorem III.8 and the simulation technique of the previous proof. We suppose now that L is such that ρ(l) is solvable. To prove the necessary condition, we will use the fact that the components A and B are connected to derive an algorithm A L. See Algorithm 4. Algorithm 4: A Consensus Algorithm A L for node v. if v is a 1 (resp. b 1 ) then Execute the algorithm A ρ(w) with role (resp. ), and neighbor b 1 as (resp. a 1 as ); Broadcast value returned by A ρ(w) ; Broadcast received value. Output: broadcasted value Before the broadcast part, A L is simply A ρ(w) executed by a 1 and b 1. This part will terminate as the communication restricted to the link (a 1, b 1 ) is exactly ρ(u) for any word u of L, and ρ(u) Γ ω. The final broadcast succeed to send the final value to any node of component A (resp. B) from a 1 (resp. b 1 ) because the component is connected and there is no omission in it for any scenarios of the scheme Γ ω C, hence for any scenarios in L. The validity and agreement properties for A L are derived easily from these properties for A ρ(w). VI. CONCLUSION AND FUTURE WORKS We have presented in this paper a formalism that help handle the question: for the Coordinated Attack Problem (that is, the Uniform Consensus Problem for two processes), which patterns of omission failures are obstructions. We also focus our study on Γ ω, the omission scheme where at most one message can be lost at any given round. We showed, up to some special cases, any subfamily is solvable. We also gave a partial extension to communication networks of arbitrary size. Even if the work on two processes was restricted to omission schemes without double omission, we were able to derive from this study an exact bound about the solvability of the Consensus Problem in arbitrary communication networks, tightening in the process the bounds given [SW07]. This partial characterizations have to be extended in two directions. First, a general characterization has to be provided, when messages can be lost at the same time. Moreover, this work should be fully extended for any given number of processes, not only for the case where the possible omissions are exactly the same at each round. As the index function has a natural topological interpretation, one way to do this would probably be to better understand the present work in relation to the now standard topological techniques that can be used in distributed computing [BG93], [HS99], [SZ00], [Ada03], [GKP03]. It is quite well understood, in the standard failure cases, how the impossibility proofs by way of bivalency are related to connected components of the global configurations space. We believe that the approach presented here can help to understand this topological relationships even in the presence of arbitrary patterns of failures. REFERENCES [Ada03] Giovanni Adagio. Using the topological characterization of synchronous models. Electr. Notes Theor. Comput. Sci., 81, 2003. [AEH75] E. A. Akkoyunlu, K. Ekanadham, and R. V. Huber. Some constraints and tradeoffs in the design of network communications. In Proceedings of the fifth ACM symposium on Operating systems principles, pages 67 74, Austin, Texas, United States, 1975. ACM. [AT99] Marcos Kawazoe Aguilera and Sam Toueg. A simple bivalency proof that -resilient consensus requires + 1 rounds. Inf. Process. Lett., 71(3-4):155 158, 1999. [BG93] Elizabeth Borowsky and Eli Gafni. Generalized flp impossibility result for t-resilient asynchronous computations. In STOC 93: Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, pages 91 100, New York, NY, USA, 1993. ACM Press. [CBGS00] Bernadette Charron-Bost, Rachid Guerraoui, and André Schiper. Synchronous system and perfect failure detector: Solvability and efficiency issue. In DSN, pages 523 532. IEEE Computer Society, [CBS09] 2000. Bernadette Charron-Bost and André Schiper. The heard-of model: computing in distributed systems with benign faults. Distributed Computing, 22(1):49 71, 2009. [CHLT00] Soma Chaudhuri, Maurice Herlihy, Nancy A. Lynch, and Mark R. Tuttle. Tight bounds for k-set agreement. J. ACM, 47(5):912 943, 2000. [GKP03] [Gra78] Rachid Guerraoui, Petr Kouznetsov, and Bastian Pochon. A note on set agreement with omission failures. Electr. Notes Theor. Comput. Sci., 81, 2003. Jim Gray. Notes on data base operating systems. In Operating Systems, An Advanced Course, pages 393 481, London, UK, 1978. Springer-Verlag. [HS99] Maurice Herlihy and Nir Shavit. The topological structure of asynchronous computability. J. ACM, 46(6):858 923, 1999. [Lyn96] Nancy A. Lynch. Distributed Algorithms. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1996. [MR98] Yoram Moses and Sergio Rajsbaum. The unified structure of consensus: A ayered analysis approach. In PODC, pages 123 132, 1998. [PP04] [PRT06] [PSL80] J.E. Pin and D. Perrin. Infinite Words, volume 141 of Pure and Applied Mathematics. Elsevier, 2004. Philippe Parvdy, Michel Raynal, and Corentin Travers. Strongly terminating early-stopping k-set agreement in synchronous systems with general omission failures. In Proc. of Sirocco 06, 2006. L. Pease, R. Shostak, and L. Lamport. Reaching agreement in the presence of faults. Journal of the ACM, 27(2):228 234, 1980. [Ray02] Michel Raynal. Consensus in synchronous systems:a concise guided tour. Pacific Rim International Symposium on Dependable Computing, IEEE, 0:221, 2002. 10
[San06] [SW07] [SZ00] N. Santoro. Design and Analysis of Distributed Algorithms. Wiley, 2006. Nicola Santoro and Peter Widmayer. Agreement in synchronous networks with ubiquitous faults. Theor. Comput. Sci., 384(2-3):232 249, 2007. M. Saks and F. Zaharoglou. wait-free k-set agreement is impossible: The topology of public knowledge. SIAM J. on Computing, 29:1449 1483, 2000. 11