From the link. Validator. network. link. Order. Sender. service. to the link CMQ ECL RMQ ICL PMQ DMQ. from?

Transcription

1 Applying Coloured Petri Nets to Analyze Fail Silent Nodes in Distributed Systems Lvia M. R. Sampaio, Jorge C.A. de Figueiredo and Francisco V. Brasileiro Department of Computer Science Federal University of Paraba Caixa Postal Campina Grande, PB, Brazil Phone: Fax: Abstract A fail-silent node is a self-checking node composed of a number of conventional fail-uncontrolled processors that work together to provide the following failcontrolled behavior: the node either functions correctly or stops functioning after an internal failure is detected. In a software implemented fail-silent node, the non-faulty processors of the node need to execute message order and comparison protocols to keep in step and check each other respectively. In this paper we present a Petri net model for a software implemented fail-silent node specication. Formal analysis by means of occurrence graph is also shown. KYWORDS Fault-tolerance, fail-silence, replicated processing, Coloured Petri Nets, Formal Analysis. 1 Introduction xperiences in constructing distributed systems which continue to provide specied services in the presence of processing site and communication failures, have shown that designing and implementing such systems is a dicult task. In a perfect world, one would liketo construct a distributed system using hardware components which are guaranteed to be either failure-free or to have well dened failure modes. However, all hardware components must fail eventually, possibly in an unpredictable manner. A sensible approach, taken by the designers of a considerable number of dependable distributed systems reported in the literature, is to build their systems assuming that the underlying hardware components are fail-controlled, i.e. present a well dened failure mode, and then build The authors are partially supported by CAPS (Coordenac~ao de Aperfeicoamento de Pessoal de nsino Superior) and CNPq (Conselho Nacional de Desenvolvimento Cientco etecnologico). processing sites or nodes and communication infrastructure that do indeed present the fail-controlled behavior assumed. Replicated processing on distinct processors whereby outputs from faulty processors can be prevented from appearing at the application level (by employing means such as comparing or voting the outputs produced by the processors), provides a practical means of constructing fail-controlled nodes capable of tolerating Byzantine (also referred to as fail-uncontrolled) processor failures. A particular case of a fail-controlled node is a + 1 processor fail-silent node that either works correctly, or stops functioning (becomes silent) soon after an internal failure is detected. This behavior of a node is guaranteed so long as no more than processors in the node fail. A two processor failsilent node( = 1) oers a practical and economical solution to the problem of constructing fail-controlled nodes. In [1] it is described practical designs of software implemented two-processor fail-silent nodes suitable for use in distributed systems that meet the abstraction of fail-silence in the following sense: a node produces either correct messages which can be veried as such by destination nodes, or it ceases to produce new correct messages, in which case destination nodes can detect any messages it may produce as unwanted. However no formal technique is employed to analyze this solution. Due to its characteristics, Petri net [6] is one of the most adequate formalism to model and analyze distributed systems. Its graphical notation, solid mathematical basis and ability to describe synchronization, communication and sharing between concurrent processes contribute to this applicability. However the models obtained in practice often became excessively large. The development of high-level Petri nets and hierarchical Petri nets solved this problem. Coloured Petri net (CPN) [4] is one of the most wellknown high-level Petri nets.

2 In this paper we use CPN to model a two processor fail-silent node. The Design/CPN tool [5] is used to analyze the models. The paper is structured as follows. We begin by describing the basic principles that underpin our failsilent nodes. In Section 3 we present and detail the CPN model for the fail-silent node. In Section 4 we address some analysis issues. Conclusions from our work are presented in the nal section of the paper. 2 Node Description System Model and Assumptions We assume that a failed processor (and therefore the processes running on that processor) can exhibit Byzantine behavior but we do make the assumption that each non-faulty processor in a node is able to sign a message it sends by axing the message with a message dependent unforgeable signature a nonfaulty processor is also assumed to be able to authenticate any signed message it receives. Digital signature based techniques [7] provide a very comprehensive way of meeting this functionality. We assume that non-replicated distributed computations are composed of a number of processes that interact only via messages. As an example, the function of a typical 'server' process is to cycle by selecting an input message from any one of its input ports, process it and, if necessary, output one or more messages on its output ports. It is necessary to assume that the computation performed by a process on a selected message is deterministic. This is the well known state machine model (where a state machine is a process) for which the precise requirements for supporting replicated processing are known [9]. Basically, in the replicated version of a process, multiple input ports of the non-replicated process are merged into a single port and the replica selects the message at the head of its port queue for processing. So, if all the non-faulty replicas have identical initial states then identical output messages will be produced by them, provided the queues of all correct replicas can be guaranteed to contain identical messages in an identical order. Thus, replication of a process requires the following two conditions to be met: 1. Agreement: all the non-faulty replicas of a process receive identical input messages 2. Order: all the non-faulty replicas process the messages in an identical order. We assume that each processor of a fail-silent node has network interfaces for inter-node communication over (possibly redundant) networks. In addition, the processors of a node are internally connected by communication links for intra-node communication needed RMQ Order From the link network DMQ Receiver from? service i link PMQ CL Transmitter CMQ Validator ICL Sender to the link Figure 1: Software architecture of a processor is a fail-silent node for the execution of the redundancy management protocols (e.g., message ordering and comparison). We assume that the maximum intra-node communication delay over a link is known and bounded: if a nonfaulty process sends a message over a non-faulty link to a non-faulty process of a neighbor processor then the message will be received within time units. For simplicity,we will assume that the lower bound on the actual transmission delay, a, iszero:0 a (so also represents the maximum variation in message transmission delays over a link). Link failures will be categorized as processor failures. A link failure that prevents a message sent from a processor to be received by its neighbor in the node will be considered as a failure of the sender processor. Basic Software Architecture We now describe the basic software architecture of a two-processor fail-silent node. In addition to application level processes (server processes), each processor of a node executes ve system processes described below (see Figure 1): 1. Sender Process: this process takes the messages produced by the server processes of that processor, signs them and sends them via the link to the neighbor processor of the node for comparison. 2. Comparator Process: this process compares authentic messages sent by the neighbor processor with their counterparts produced locally. If a message comparison succeeds, the singly signed authentic message received from the neighbor is counter signed (by considering the rst signature as a part of the message) and this double signed message, termed a valid message, is

3 handed over to the local Transmitter process for network delivery to destination nodes (clients). A comparison that detects a disagreement indicates a failure. Similarly, an absence of a message for comparison (after a node specic timeout interval) also indicates a failure. Once a failure is detected, the comparator process stops, and so does the sender process. No new valid messages can be produced by the node. 3. Transmitter Process: this process is responsible for sending the double signed messages over the network to destination nodes. As each processor has a Transmitter process, a correct node will generate two copies of its output messages. 4. Receiver Process: this process authenticates messages received from the network or from the link and discards any inauthentic or duplicate messages. Authenticated messages from the network (valid messages) are sent to the local Order process. Authenticated singly signed messages from the link are sent to the Comparator. 5. Order Process: this process executes an order protocol with its counterpart in the other processor of the node in order to construct identical queues of valid messages for processing by the server processes. Since such a protocol entails the Order process to relay valid messages to its counterpart, it is sucient for a message to be received from the network by any one of the processors of a node for it to be ordered at both the processors. The architecture can be adapted for the more general case of +1 processor fail-silent node such a node will produce + 1 signed valid messages. Node Failure Semantics We assume that server processes of correctly functioning nodes assign monotonically increasing sequence numbers to new messages they produce this property enables correctly functioning destination nodes to discard replicas of any previously received messages. Let an application process running on a correctly functioning unreplicated node take t units of time to compute the response to an input message. The corresponding correct output from a fail-silent node will take at most t 0 = t + tdelay units of time, where tdelay, tdelayi0, is the bounded worst-case delay introduced by the redundancy management protocols. If the output from the fail-silent is produced later than t 0 then the node will be said to have suered a performance failure [3]. A fail-silent nodecanbe in one of the following three states (see Figure 2): Failing Normal Silent Figure 2: Fail-silent node states 1. Normal State: In this state, a node produces correct outputs. Detection of an internal failure (by a comparator) causes the node to irreversibly enter either the failing state or the silent state. 2. Failing State: This is an intermediate state in which the node can suer at most one performance failure. From this state the node eventually enters the terminal silent state. 3. Silent State: No new valid messages are produced by the node. Any messages produced by the node can only be invalid or copies of previously produced valid messages: any functioning destination node can detect these messages as unwanted. The reason for the existence of the intermediate failing state is as follows. A faulty processor can contain a message from the correct processor sent for comparison (a message that was sent before the correct processor stopped). The faulty processor can output this asavalid double signed message at any future time. The Sender and Comparator processes of each processor must therefore incorporate intra-node message synchronization measures to ensure that each processor of a node at anytimecontains no more than one message from the neighbor for comparison in this way, thenumber of performance failures in the failing state can be limited to at most one. The fact that a fail-silent node can suer a single performance failure in the intermediate state is not a cause for concern. Consider "fail-crash" nodes without an intermediate state. Applications with timing constraints running over these nodes will still be expected to contain timeliness checks for detecting late or absent messages. The same checks will be adequate for the case of fail-silent nodes for ltering out late responses. If application programs have no timing constraints, then a performance failure suered by a fail-silent node in the failing state will not cause any inconsistencies. Thus, a system of software implemented fail-silent nodes can be regarded as capable of implementing the abstraction of fail-silence in the following sense: a node produces either correct messages which canbeveried as such by destination fail-silent nodes, or it ceases to produce new correct messages,

4 Hierarchy#10 ReplicatedNode#1 M Prime Node#2 GlobalDecl#5 Network#3 node net procr1 procr2 Processor#4 Figure 3: CPN Hierarchy Page Receiver#6 receiver_p Comparator# comparator_p Sender#8 sender_p Order#9 order_p in which case destination nodes can detect any messages it may produce as unwanted. It is possible to design specialized fault-tolerant network interfaces that could prevent further messages from being output by a node once one of the processors detects a failure. Minimally, we need to provide a network interface with a single switch that can unilaterally and irreversibly be switched o by a control signal sent by either of the processors in the node. Any software solution to the design of a node that has no intermediate failing state will require additional redundancy. For example, one could delegate the responsibility of message comparison and output to a separate node that does not fail. A failure-masking node (capable of masking processor failure within anode) could provide the services of message comparison and output to a collection of +1 processor nodes. Indeed the failure-masking node can provide other services, such as recording the status of fail-silent nodes. This design very much resembles that of a system of failstop nodes [8] that can switch from the functioning to the halted state, and can provide failure-status indication. 3 CPN Model In this section we present the CPN model for a two processor fail-silent node, representing its components (processors, processes, input/output buers, etc.) and the system for both intra and inter-node communication. The model is a hierarchical CPN composed by9 pages, 1 for global declarations (GlobalDecl) and the others for net structure (ReplicatedNode, Node, Network, Processor, Receiver, Comparator, Sender and Order). Due to space limitation we detail only two pages: processor page and comparator page. The page Processor allows us to observe a general view of the entire model whereas the page Comparator shows us the silent behavior of the node when a failure is detected. As can be seen from the CPN hierarchy page in Figure 3, ReplicatedNode is the prime page, it represents inter-node communication (node$net). The communication system modeled in Network is very simple, there are input/output buers through which processors can send/receive messages to/from physical network, respectively. These buers representthe network interfaces and communication links for inter and intra-node communication mentioned in Section 2. Page Node describes the arrival of messages (service requests) to processors, and the sending of outputs produced by these processors (double signed messages) to the clients. ach processor of a two-fail silent node is represented by an instance of Processor that is the most important page of the model. It contains the main ow of control into the node, since receiving until validation of the message (see Figure 4). This page describes all the processes and buers that form a processor, as is illustrated in Figure 1. The exception is the transmitter process, due to its simplicity it was represented in page Node. The receiver process authenticate messages for that processor and put them into an appropriate buer according to the type of the message. These buers are used as input places for order and comparator processes. We modeled the order process that implements a protocol based on logical clocks, as described in [1]. Ordered messages are put into buer dmq indexed by the identiers of the server processes running on the processor. The execution of services is represented by the transition attribute id that attributes a sequence number to each output message. These messages will be sent tothecomparator process of the neighbor processor for validation. When a failure is detected in the comparator process (explained later), a token is put in place silent, meaning that the comparator stopped functioning. A token in this place also forces the sender process to stop (detailed in page Sender not showed in this paper). If the validation is OK (a token in place buer msg) a message is put in the output buer of the processor. very time that a message is sent through network (for ordering, comparison or as output of a service request) information about the receiver must be updated, this is represented by place info receiver. ach process corresponds to one substitution transition. The processes described above are represented by pages with the same name. All declarations of color sets (lists, records, tuples, etc.), variables and functions used in the model are dened in the page GlobalDecl. The model was built using Design/CPN [5] and time features of the model were supplied by Design/CPN timing support. The page that models the comparator describes the structure of the comparator process, that is, all the elements that contribute to the execution of this process (see Figure 5). The buers icl and ecl are lists of messages from the local processor, where the comparator is running, and the neighbor processor, respectively. Messages in

5 BufPROCR buffer_procr (processsor,) (processor,msg::) receiver_p if #typemsg(msg)=order then ^^[msg] else if #typemsg(msg)=orderi then ^^[msg] else if #typemsg(msg)=compare then ^^[msg] else IdPROCR processor rmq msg:: msg:: secl ecl order_p (idproc,) (idproc,include_msg(msg,)) 1 (0,1)+ 1 (0,2) BufID buffer_ids (idproc,) (idmens,idproc ) 1 (1,)+1 (2,) BufPROC dmq (idproc,msg::) (idmens+1,procid ) [#idpc(#receiver(msg))=idproc ] attribute_id ^^[update_id(msg,idmens)] (idnode, ) BufNODO buffer_in_net (idnode, ^^[update_type (msg,orderi)]) (idnode, ^^[update_type (msg,compare)]) (idnode, ) pmq msg:: sender_p silent msg :: ^^[update_type (update_sign(msg,0),compare)]@+4 compare_p ^^[update_sign (msg,#numsign(msg )+1)] buffer_msg msg:: msg:: [#idnd()<> #idnd(#receiver(msg)), #idpc()= #idpc(#receiver(msg))] ^^[update_receiver attribute_info_receiver (update_type (msg,order),)] icl procr_output 1 {idnd=1,idpcr=1,idpc=1} +1 {idnd=2,idpcr=1,idpc=1}+ 1 {idnd=1,idpcr=1,idpc=2} +1 {idnd=2,idpcr=1,idpc=2}+ 1 {idnd=1,idpcr=2,idpc=1} +1 {idnd=2,idpcr=2,idpc=1}+ 1 {idnd=1,idpcr=2,idpc=2} +1 {idnd=2,idpcr=2,idpc=2} InfoPROC info_receiver Figure 4: Processor page icl must have acounterpart in ecl with the same sequence number (identier). These numbers are compared and if they match, the validation is OK (a token in place ok), otherwise, it is characterized a failure (this check is represented by transitions compare1 msg and compare2 msg). The other possibility to have a failure is when there is no message to be compared, that is, there is a message in ecl with no correspondent in icl, or vice-versa. To model this situation we used the time support of Design/CPN. All messages put in icl have a timestamp attributed, automatically, by the simulator, during execution, according to the inscription on the output arc from transition sign msg, in page Sender, to icl, in page Comparator. This timestamp indicates that the message will be available in icl just after some units of time (timeout). So, if after this time, there is no message in ecl to be compared, we can assume that a failure occurred. In both cases, when a failure occurs the comparator stops functioning. This is modeled by a token in place silent. Place ecl empty controls the number of messages in ecl, allowing just one token at any time. This guarantees the existence of the failing state. 4 Simulation and Analysis of the Model The behavior of a two processor fail-silent node can be perceived through the three states discussed in Section 2, namely: normal, silent and failing. It is important to guarantee that in the presence of a failure, the node will not produce any valid message or at most one, that is, when a failure occurs there are just two possible states assumed by the node: silent or failing, respectively. Simulation is a very interesting way to debug a model and can be used to get additional understanding as well. It is similar to the testing and execution of a program. During the simulation we detected a number of errors in the model, leading to the remodeling of some parts of the model. Through simulation we observed the failing state when we considered one message on the network and the existence of a failure condition. Occurrence graphs (O-graph) [2] were also used to investigate the dynamic behavior of the models. As is pointed out in [4], this kind of analysis is an indispensable complement to the more straightforward and intuitive simulation possibilities. We construct the O-graph for one message on net-

6 icl msg:: e ecl_empty send_msg_ecl msg :: ecl msg :: msg MS msg_ecl msg :: msg:: [(#mensid(msg)=#mensid(msg )) andalso (#numsign(msg)=0) andalso (#numsign(msg )=1)] compare1_msg ^^[msg ] ok msg :: ^^[update_sign (msg,#numsign(msg )+1)] e comparator_free counter_sign_msg buf_msg silent [(#mensid(msg)<>#mensid(msg )) andalso (#numsign(msg)<>0) andalso (#numsign(msg )<>1)] compare2_msg [(<>) andalso ( =)] compare3_msg Figure 5: CPN Page Comparator work, representing a service request from a client node. Two situations were considered: in the rst one both processors received messages to be compared. As a result, we observed the normal state. In the second situation, a failure condition was introduced and we perceived the silent state. For one message on the network it is not possible to reach the failing state. We tried to generate the O-graph for two messages on the network. However, for this situation, it was only possible to get a partial occurrence graph that restricted the verication of the model. To obtain O- graphs with a manageable size we do believe thatit is necessary to simplify the model. In order to get some additional understanding of the system as well as to verify the existence of the failing state discussed earlier, we used small congurations (scenarios). The occurrence graph analysis became as extended simulation, allowing the investigation of all possible sequences for the considered scenario. It was also possible to observe that some parts of the model were not being specied in sucient detail. 5 Conclusion In this paper we have showed the modeling and analysis of a two processor fail-silent node for distributed systems by means of Coloured Petri Nets. The model describes important aspects of the node that contributes to provide the specied behavior. Simulation and occurrence graph analysis allowed the detection and correction of errors in the original model. Although for some kind of congurations we did not succeed to generate the full O-graph, we could detect and correct some errors. Moreover, it was possible to observe the three states specied for the node. Currently, we are working on the simplication of the model in order to overcome the analysis limitation. 6 RFRNCS [1] F. V. Brasileiro, P. D. zhilchelvan, S. K. Shrivastava, N. A. Speirs, and S. Tao. Implementing Fail-Silent Nodes for Distributed Systems. I Transactions on Computers, 45(11):1226{1238, November [2] S. Christensen and K. Jensen. The Design/CPN Occurrence Graph Tool - user's manual version 1.0. Technical report, Computer Sceience Department, Aarhus Universite, Aarhus, Denmark, [3] F. Cristian. Understanding Fault-Tolerant Distributed Systems. Communications of the ACM, 34(2):57{78, February [4] K. Jensen. Coloured Petri Nets: Basic Concepts, Analysis, Methods and Practical Use, volume 1 of ACTS { Monographs on Theoretical Computer Science. Springer-Verlag, [5] K. Jensen, S. Christensen, P. Huber, and M. Holla. Design/CPN. A Reference Manual. Technical report, Meta Software Corporation, Cambridge Park Drive, USA, [6] T. Murata. Petri Nets: Properties, Analysis and Applications. Proc. of the I, 77(4):541{580, April [7] R. Rivest, A. Shamir, and L. Adleman. A Method of Obtaining Digital Signatures and Public-key Cryptosystems. Communications of the ACM, 21(2):120{ 126, February [8] F Schneider. Byzantine Generals in Action: Implementing Fail-Stop Processors. ACM Transactions on Computer Systems, 2(2):145{154, May [9] F. Schneider. Implementing Fault Tolerant Services Using the State Machine Approach: a Tutorial. ACM Computing Surveys, 22(4):299{319, December 1990.