1 N-Party BAR Transfer Xavier Vilaça Dissertação para obtenção do Grau de Mestre em Engenharia Informática e de Computadores Júri Presidente: Orientador: Vogal Prof. Pedro Manuel M. Antunes de Sousa Prof. Luís Rodrigues Prof. Alysson Neves Bessani October 2011
3 Acknowledgements First of all, I would like to thank my advisor, Prof. Luís Rodrigues, for the unique opportunity to work with him, and for continuously providing the right motivation and advices. Without them, I would not be capable of successfully overcoming the main obstacles of this work. I would also like to thank João Leitão for his fruitful suggestions and for the very enlightening discussions we had. I would like to express my gratitude to Oksana Denysyuk and Miguel Correia for their help. Their theoretical insight were essential to the success of this work. These acknowledgements should not be concluded without recognizing the importance of my colleagues Nuno Machado, Pedro Louro, João Fernandes and Pedro Ruivo, and other members of the GSD group at INESC-ID, for their useful comments and most precious support. Finally, I would like to thank my parents and sister for their emotional support. This work was partially supported by the FCT (INESC-ID multi annual funding through the PIDDAC Program fund grant and by the project PTDC/EIA-EIA/102212/2008). Lisboa, October 2011 Xavier Vilaça
5 For my parents and sister, Francisco, Maria Margarida and Marta
7 Resumo As redes peer-to-peer têm ganho ênfase no contexto da execução de projectos científicos, recorrendo a recursos disponibilizados por voluntários. Este tipo de arquitecturas levanta alguns desafios, já que nem todos os nós da rede seguem o comportamento esperado. Na realidade, há nós que seguem um comportamento arbitrário (Bizantino), ou um comportamento egoísta (Racional), tal como especificado no modelo Byzantine-Altruistic-Rational (BAR). Para além disso, se as computações se realizarem num modelo como o MapReduce, tem de existir algum mecanismo que permita às tarefas comunicarem directamente entre si. Esta tese introduz o problema N-party BAR Transfer (NBART), que consiste na transferência fiável de dados de um conjunto de produtores para um conjunto de consumidores no modelo BAR. Trata-se de uma primitiva relevante para a computação voluntária pois permite a comunicação fiável entre tarefas e evita que os voluntários sejam forçados a armazenar os resultados das computações por longos períodos de tempo, podendo transferir os dados para outros voluntários, após terem armazenado essa informação por um período mínimo de tempo. São propostos dois algoritmos que resolvem o problema NBART. O primeiro é executado num número constante de rondas, mas incorre em custos superiores de comunicação. Já o segundo reduz os custos de comunicação em detrimento do aumento do tempo de execução. Para ambos os algoritmos, prova-se a sua correcção desde que haja uma maioria de voluntários não Bizantinos e nenhum voluntário seja Racional. Através da Teoria de Jogos, demonstra-se que ambos os algoritmos providenciam equilíbrios de Nash.
9 Abstract Peer-to-peer networks have emerged as a relevant architecture for executing scientific computations using resources provided by volunteers. However, this architecture poses several challenges, since volunteers may fail arbitrarily (Byzantine behaviour) or even follow a selfish behaviour (Rational behaviour), as indicated by the Byzantine-Altruistic-Rational (BAR) model. Also, if such distributed computations are to be performed in a computational model such as MapReduce, there must be some mechanism for reliably transferring data among tasks. This thesis introduces the N-party BAR Transfer (NBART) problem, which is the problem of reliably transferring data from a set of producers to a set of consumers, in the BAR model. This is an important building block for volunteer computing, since it allows tasks to reliably communicate among each other, and volunteers are not obliged to store the data for long periods of time. Instead, each volunteer may transfer the data, after storing it for a minimum amount of time, to another volunteer. This thesis also proposes two alternative algorithms for solving the NBART problem. One executes in a constant number of rounds and has greater communication complexity, while the other has a lower communication complexity but it has a greater execution time. It is proved that, if no volunteer follows a Rational behaviour, then both algorithms ensure a reliable transfer of data when a majority of producers and a majority of consumers are non-byzantine. In addition, a theoretical analysis based on Game Theory is performed, proving that both algorithms provide a Nash equilibrium.
11 Palavras Chave Keywords Palavras Chave Modelo BAR Armazenamento de Dados Transferência de Dados Tolerância a Faltas Bizantinas Comportamento Racional Teoria de Jogos Equilíbrio de Nash Keywords BAR Model Data Storage Data Transfer Byzantine Fault Tolerance Rational Behaviour Game Theory Nash-equilibrium
13 Índice 1 Introduction Motivation Contributions Results Research History Structure of the Document Related Work Introduction System Models Synchrony Types of Faults Causes of Faults Natural Causes Malicious Behaviour Rational Behaviour Strategies for Dealing with Faults Fault Models Crash-stop Fault model Byzantine Fault Model i
14 BAR Fault Model Data Transfer without Rational Behaviour Reliable Broadcast (Regular) Reliable Broadcast Probabilistic Reliable Broadcast Consensus and Related Problems Consensus Interactive Consistency Byzantine Generals Total Order Broadcast Terminating Reliable Broadcast Theoretical Results Consensus According to Paxos Byzantine Registers Byzantine Quorums State Machine Replication Hybrid Approaches Previous Work that Considers Rational Behaviour Incentive-Based Protocols Direct Reciprocity Reputation Non-Repudiation and Fair-Exchange Two-Party Non-Repudiation and Fair-Exchange Multi-Party Non-Repudiation and Fair-exchange ii
15 2.5 Previous Work that Considers Byzantine and Rational Behaviour Previous Work in the BAR Model Other Works Existing Solutions Agreement Terminating Reliable Broadcast Free-Riding Bittorrent Samsara BAR Model BAR for Cooperative Services BAR Primer BAR Gossip Flightpath FireSpam Summary NBART Problem Informal Definition Challenges Byzantine-Altruistic Solution Dealing with Rational Behaviour Incentives Provided in Multiple Instances Incentives Provided in a Single Instance Properties iii
16 3.4 Related Problems Byzantine Fault Tolerance Agreement among Processes Byzantine Replication Problems Derived from Rational Behaviour Free-Riding Non-Repudiation Related Problems in the BAR Model BAR Agreement and Byzantine Replication Data Dissemination k-fault-tolerant Nash Equilibrium Discussion Use Cases Use Case: BARRAGE Use Case: BARCKUP Summary Synchronous Risk-Averse Solutions System Model ERA-NBART Overview of the Algorithm Algorithm in Detail Analysis Correctness Game Theoretical Analysis iv
17 Complexity Analysis LRA-NBART Algorithm Analysis Correctness Game Theoretical Analysis Complexity ERA-NBART vs LRA-NBART Bit Complexity Processing Complexity Packet Transmission Complexity Storage Complexity Conclusions Solutions for a Model where N P N C Summary Conclusions and Future Work Conclusions Future Work Bibliography 108 v
19 List of Figures 3.1 NBART: Reliable transfer of data Data preservation through successive transfers Problem of applying non-repudiation, for f = 1 and N = vii
21 List of Tables 3.1 Comparison between previous solutions and possible solutions to NBART Bit complexity parameters Processing, storage, and packet transmission complexity parameters ERA-NBART: bit complexity excluding headers ERA-NBART: producer processing, packet transmission, and storage complexity ERA-NBART: consumer processing, packet transmission, and storage complexity ERA-NBART: TO processing, packet transmission, and storage complexity LRA-NBART: bit complexity excluding headers LRA-NBART: producer processing, packet transmission, and storage complexity LRA-NBART: consumer processing, packet transmission, and storage complexity LRA-NBART: TO processing, packet transmission, and storage complexity Summary of the bit complexity of ERA-NBART and LRA-NBART Summary of the processing complexity of ERA-NBART and LRA-NBART Summary of the packet transmission complexity of ERA-NBART and LRA-NBART Summary of the storage complexity of ERA-NBART and LRA-NBART Comparison of the bit complexity Comparison of the processing complexity Comparison of the packet transmission complexity Comparison of the storage complexity ix
22 4.19 ERA-NBART: Definition of producerset and consumerset for N P N C LRA-NBART: Definition of producerseq and consumerseq for N P N C x
23 Acronyms BAR Byzantine Altruistic Rational NBART N-party BAR Transfer P2P Peer-to-Peer ERA-NBART Eager Risk-Averse NBART LRA-NBART Lazy Risk-Averse NBART 1
25 1 Introduction This thesis addresses the problem of transferring data in a peer-to-peer network. We study mechanisms for ensuring a reliable transfer of information even if some nodes of the network adopt a rational or arbitrary behaviour. 1.1 Motivation Peer-to-peer systems may be used to provide temporary or long-term storage services. Such services are useful in a number of settings. For instance, peer-to-peer systems can be used to process large volumes of data using volunteer computation, as illustrated by projects such as (Anderson, Cobb, Korpela, Lebofsky, & Werthimer 2002) and, more recently, by the Boinc infrastructure that supports several computationally intensive research projects (Anderson 2004). If such computations are performed using MapReduce (Dean & Ghemawat 2004), information produced by mappers needs to be transferred to the reducers or to intermediate storage. Volunteer storage nodes may not be willing to store data indefinitely, so they have to transfer data to other nodes after serving the system for some time. In any case, volunteers expect to be recognized for their contribution, for instance by being awarded credits that make them appear in a chart with the top contributors of the project. In scenarios such as the ones listed above, a reliable protocol to transfer data from a set of producers to a set of consumers is an important building block. Any realistic service for this environment has to consider the existence of both Byzantine and Rational nodes, i.e., of nodes that deviate from the protocol, respectively, in an arbitrary way (Byzantine) and with the purpose of gaining some measurable benefit (Rational). We use the words processes or participants interchangeably to designate these entities. The fact that Byzantine participants may adopt an arbitrary behaviour may compromise the safety of the system. Thus, it is imperative to deal with this kind of behaviour. Furthermore,
26 4 CHAPTER 1. INTRODUCTION systems not designed to cope with Rational behaviour may fall into the Tragedy of Commons (Hardin 1968): the job is not done because all participants are Rational and aim for profit by not performing (part of) their role. Previous work that studied the problem of transferring data among sets of processes either focused only on Byzantine or on Rational behaviour (Lamport, Shostak, & Pease 1982; Cohen 2003; Cox & Noble 2003; Castro & Liskov 2002; Malkhi & Reiter 1997). Only recently, solutions have been proposed that take into account both Rational and Byzantine behaviour (Aiyer, Alvisi, Clement, Dahlin, Martin, & Porth 2005; Li, Clement, Wong, Napper, Roy, Alvisi, & Dahlin 2006; Li, Clement, Marchetti, Kapritsos, Robison, Alvisi, & Dahlin 2008; Clement, Li, Napper, Martin, Alvisi, & Dahlin 2008). The system model used for modelling the behaviour of participants in those solutions has been coined the Byzantine-Altruistic-Rational (BAR) model (Aiyer, Alvisi, Clement, Dahlin, Martin, & Porth 2005). Game Theory (Martin & Ariel 1994) is an interesting approach for modelling Rational behaviour, since it provides a formal view of the interactions among Rational participants, which allow us to reach conclusions regarding their expected behaviour. In this approach, the protocols executed by the processes are modelled as a game, in which each player (i.e., process) follows a strategy to maximize its utility. The concept of Nash equilibrium (Nash 1951) may be applied to prove that no player has any incentives to deviate from the protocol, given that all the remaining players follow the protocol. Therefore, if a protocol provides a Nash equilibrium and it is a dominant strategy (no other Nash equilibrium has a greater utility), then all Rational players should follow it (Mailath 1998). Therefore, this work focus on creating mechanisms for reliably transferring data among two sets of processes of a peer-to-peer network, where the behaviour of participants is characterized according to the BAR model. Hence, these mechanisms should tolerate Byzantine behaviour and should provide incentives for Rational participants to follow a behaviour that does not compromise the safety of the system. We intend to use the concepts of Game Theory to analyse Rational behaviour.
27 1.2. CONTRIBUTIONS Contributions This work studies the problem of transferring data from a set of N P producers to a set of N C consumers, where the behaviour of participants is modelled by the BAR model. We decided to name this problem N-party BAR Transfer (NBART). Regarding Rational behaviour, we distinguish between risk-averse and risk-seeking processes. Risk-averse processes are not willing to deviate from the protocol if that may risk their utility. On the other hand, risk-seeking processes follow the behaviour that maximizes their expected utility, even if that behaviour may risk their utility. More precisely, the thesis makes the following contributions: It introduces the NBART problem, defining its main properties and challenges. It proposes two alternative algorithms that solve the NBART problem in a synchronous environment where processes are risk-averse: The first algorithm, named Eager Risk-Averse NBART (ERA-NBART), has low execution time and high bit complexity. The second algorithm, named Lazy Risk-Averse NBART (LRA-NBART), has low bit complexity and high time complexity. 1.3 Results The results of this work can be enumerated as follows: A proof that the ERA-NBART and LRA-NBART algorithms are correct when up to f processes of each of the sets of producers and consumers are Byzantine, as long as N P, N C 2f + 1. A Game Theoretical analysis of ERA-NBART and LRA-NBART that proves that both algorithms provide a Nash equilibrium. We also claim that the algorithms provide a dominant strategy, therefore all Rational processes should follow them. A complexity analysis and a comparison of the proposed solutions.
28 6 CHAPTER 1. INTRODUCTION 1.4 Research History The initial aim of this work was to design, implement, and evaluate a subsystem for data storage in the BAR model, named BARRAGE. The main goal of this system was to support distributed computations over a peer-to-peer network of volunteers, by allowing producers to store intermediate results in processes named storers that are neither producers nor consumers. This way, it would be possible to preserve the data for long periods of time without requiring producers to remain connected until the final consumption of the produced data. The main challenges were how to ensure a reliable transfer of the data from the producers to the storers, and how to preserve the data until its final consumption since storers may not remain connected to the network indefinitely. Eventually, it was realized that a mechanism for reliably transferring data from producers to storers could also be used to transfer the data between different sets of storers and from the final set of storers to the consumers. Therefore, focus was given to solve this particular problem. This work was performed in the context of the HPCI project: High-Performance Computing over the Large-Scale Internet (PTDC/EIA-EIA/102212/2008). During this work, we benefited from the fruitful collaboration with the remaining members of the GSD team working on HPCI, namely João Leitão, Miguel Correia, and Oksana Denysyuk. Parts of this work were published by Vilaça, Leitão, & Rodrigues (2011a), by Vilaça, Leitão, & Rodrigues (2011b), and by Vilaça, Leitão, Correia, & Rodrigues (2011). 1.5 Structure of the Document The remaining of this document is organized as follows. Chapter 2 describes previous work related to the problems that were introduced. Chapter 3 provides a definition of the NBART problem. The proposed solutions are presented in Chapter 4. Finally, Chapter 5 concludes the report and provides directions for future work.
29 2.1 Introduction 2 Related Work This chapter provides an overview of the previous work related to the transfer of information among processes in different fault models. A special emphasis is given to the BAR model and solutions that use it to portray the behaviour of processes. First, we introduce in Section 2.2 the different system models used in the literature, which are relevant to understand and compare the described solutions and to motivate the use of the BAR model. In Section 2.3, we survey the traditional work on data transfer among processes. In Section 2.4, we describe the main existing mechanisms for providing incentives to Rational processes to obey the protocols. Then, in Section 2.5, we survey the most important work that deals with Rational and Byzantine behaviour, simultaneously. Finally, in Section 2.6, we describe the most important solutions which are later used as basis for comparison with the work presented in this thesis. 2.2 System Models A distributed system can be defined as a set of processes that exchange messages over communication channels (Lamport 1978). In this context, there are several different models that capture the behaviour of a distributed system. The most important characteristics to consider in a system model are its synchrony and types of faults of its components (processes and communication channels). In this section, we characterize the different timing assumptions that define the level of synchrony of a system and we identify the main types of faults of its components. Then, we describe the possible causes of faults, strategies that may be used to deal with faults, and fault models that are more relevant to this work.
30 8 CHAPTER 2. RELATED WORK Synchrony The synchrony of a system model can be modelled using two parameters, that capture the timing assumptions about the behaviour of components. The first parameter is the process speed interval, which represents the difference in the processing speed of the fastest and the slowest process of the system. The other parameter is the communication delay, which is the latency of communication steps, i.e., the time elapsed between the emission and reception of messages. The level of synchronism of a model ranges from purely synchronous if there is a known upper bound on both parameters which always holds, to purely asynchronous if none of the parameters are upper bounded. Many intermediate levels of synchrony may be considered. For instance, there may be an unknown upper bound on the parameters which holds forever; or there may be known or unknown upper bounds which eventually hold forever or during a given period of time Types of Faults In a distributed system where processes exchange messages, components may be faulty if they fail to perform some work or non-faulty if they always perform as expected. Eventually, a fault may lead to a failure if its effects become externally visible (Avizienis, Laprie, Randell, & Landwehr 2004). Several types of faults may occur. We distinguish between crash faults, omission faults, and arbitrary faults: Crash Faults: A crash occurs when a component stops executing operations it should perform. However, a crash never results in a process performing erroneous operations or in a communication channel producing duplicated or spurious messages. For instance, crash faults may occur when a cable is unplugged, when there is a power loss, or when a software bug or a hardware malfunction cause a computer to disconnect from the network. Omission Faults: Omission faults occur whenever a component fails to perform a simple operation. More precisely, processes may omit messages or certain execution steps, and communication channels may omit the transmission of a message. For instance, a message may be silently discarded due to a buffer overflow.
31 2.2. SYSTEM MODELS 9 Arbitrary Faults: Arbitrary faults are a broader class of faults that may include faults by crash, malicious faults, faults due to Rational behaviour, among others. For instance, malicious processes may deliberately produce wrong information, omit messages, or discard stored data; Rational processes may omit certain messages or execution steps; malicious communication channels may drop or tamper with messages; or honest components may crash due to arbitrary reasons Causes of Faults Faults may occur due to many possible reasons. To better identify the appropriate strategy to deal with faults, it is useful to classify the causes of faults into three main classes: natural causes, malicious behaviour, and rational behaviour Natural Causes A common source of faults is related to events not intentionally caused by humans. For instance, power loss or a user that inadvertently disconnects its computer from a power source may cause a process to stop performing operations; a disconnected or damaged cable may prevent communication channels from delivering messages; a software or hardware bug may cause a process to produce erroneous data; among others Malicious Behaviour A malicious fault always occur due to the intentional actions of an adversary: a malicious user such as an hacker, a component infected by malicious code (worms, viruses, trojan horses, bots,...); a network of controlled components (botnets); etc. Malicious components may act with many distinct goals: destroy or jeopardize the services provided by a particular company; revealing secrets hold by an organization; increase their account balance or access other accounts; among others. Their actions may be arbitrary, therefore they may cause any type of faults. For instance, malicious components may selectively switch between correct and malicious behaviour.
32 10 CHAPTER 2. RELATED WORK Rational Behaviour A process follows a Rational behaviour if it aims at maximizing a given utility function that is known. That is, if it aims at maximizing the benefits obtained from the system, while minimizing the costs incurred for performing operations, such as sending messages, performing computations, or storing data for long periods of time. A fault due to Rational behaviour occurs whenever the behaviour that maximizes the utility of a component is not compatible with the expected behaviour. We can identify the following main problems derived from Rational behaviour: Free-Riding: The problem of free-riding in the context of P2P has been discussed by Hughes, Coulson, & Walkerdine (2005). Free-riders will exploit the altruism of other participants to take advantage of the system without reciprocating in a reasonable manner, i.e., providing at least as much resources as they consume. Experience with several practical systems (Liang, Kumar, & Ross 2005) has shown that this problem can ultimately prevent the system from working. This leads to a situation called Tragedy of the Commons (Hardin 1968), where every process is expecting others to provide services, but that never happens and the system stops working. Alternative Behaviour: Most protocols provide different behavioural options for each step of an interaction. It is possible for a Rational process to take advantage of this and choose a particular option that minimizes its cost or allows it to gain unfair advantages, instead of making the adequate choice. For instance, in a partner selection mechanism that is based on a random number generator, a Rational component always generate the same number in order to always pick the best peer (Li, Clement, Wong, Napper, Roy, Alvisi, & Dahlin 2006). Collusion: This problem occurs both with malicious and Rational processes. Collusion happens when several processes coordinate their actions to defeat the mechanisms in place that aim at preventing or detecting incorrect behaviour (Hayrapetyan, Tardos, & Wexler 2006).
33 2.2. SYSTEM MODELS Strategies for Dealing with Faults Several distinct techniques may be used to address the problem of faults in systems. In this thesis, we focus on fault prevention and fault tolerance techniques (Avizienis, Laprie, Randell, & Landwehr 2004). Fault Prevention: Techniques for preventing faults increase the dependability of the system by reducing the likelihood of fault occurrence. Good programming practices can prevent software errors and avoid vulnerabilities that may lead to security attacks. If security is taken into account, it may be possible to prevent malicious processes from disclosing or disrupting information, and from forging identities. Incentive mechanisms can provide enough reasons for rational participants to follow the specified protocols, if this is the alternative that maximizes their profit. Fault Tolerance: Fault tolerance consists in adding mechanisms that allow the system to continue to provide the intended service despite the occurrence of faults. Fault tolerance may be achieved by detecting the faults and applying corrective measures, or simply by masking the fault using redundancy. In the context of data transfer, it aims at avoiding data loss or corruption. In this thesis, we are focused on systems characterized by the BAR model. Hence, faults may occur as a result of any possible cause. Faults due to Rational behaviour may be prevented by providing incentives for Rational processes to not misbehave. However, faults caused by natural events or malicious behaviour cannot be avoided, thus fault tolerance mechanisms must be applied Fault Models As we have seen, components may fail due to many reasons and exhibit different behaviours when faults occur. Fault models are abstractions that describe the types of faults that may occur in real systems. For the purpose of this work, we assume that communication channels do not fail. Hence, we will focus on fault models that capture the possible faults of processes. More precisely, we will describe the main fault models of the literature that identify the possible