Agreement under faulty interfaces

Agreement under faulty interfaces

Information Processing Letters ELSEVIER Information Processing Letters 65 ( 1998) 125-129 Agreement under faulty interfaces Pallab Dasgupta * Dep...

493KB Sizes 5 Downloads 55 Views

Information Processing Letters

ELSEVIER

Information

Processing

Letters 65 ( 1998) 125-129

Agreement under faulty interfaces Pallab Dasgupta * Department

of Computer Science and Engineering,

Indian Institute of Technology, 721302 Kharagpur, India

Received 23 September 1997; revised 10 November Communicated by T. Asano

1997

Abstract In this paper we study the problem of achieving Byzantine agreement among a set of processors, where the processors are computationally sound but their interfaces with the communication channels may be faulty. We consider three types of fault, namely message corruption, message loss, and spurious message generation. We present the following results for this model: (i) If all three types of faults are present then the problem is equivalent to the classical Byzantine generals problem. (ii) In the cases where only message corruption can occur, agreement becomes trivial and can be achieved in one round. (iii) If spurious message generation is ruled out, that is, when interfaces may fault only when sensitized, agreement is possible irrespective of the ratio of the number of processors having faulty interfaces with the total number of processors. @ 1998 Published by Elsevier Science B.V. Keywords: Algorithms; Byzantine agreement; Byzantine generals Spurious message generation

1. Introduction The Byzantine generals problem [ 51 is one of the fundamental problems for reaching a mutual agreement in a distributed system. The problem is defined as follows: There is a set of generals (processors) camped outside an enemy city. The generals are located at geographically distant positions from each other and can communicate through reliable messengers (network channels). The generals must come to a common decision on whether to attack or to retreat. Some of the generals are traitors (faulty processors) and may communicate arbitrary decisions to different generals.

problem; Distributed

@ 1998 Published 97) 00202-o

Message corruption;

Message loss;

Several protocols are known for solving the problem in its core form and also for solving different variants of the problem [ I-61. In this paper we look at a variant of the Byzantine agreement problem, where the processors representing the traitors are sound, but their interfaces with the communication network may be faulty, causing them to send erroneous messages through the communication network. Our model may be briefly described as follows: l Each processor has one or more interfaces to the network. These interfaces are analogous to network interface cards of computers. l In order to communicate a message, the source processor passes the message to the appropriate network interface of the source processor. The destination processor receives the message from the

’ Email: [email protected]. 0020-0190/98/$19.00 PII SOO20-0190(

systems;

by Elsevier Science B.V. All rights reserved.

126

I! Dasgupta/Information

Processing

appropriate network interface of the destination processor. A network channel receives a message from the network interface of the sender processor and delivers it to the network interface of the destination processor. As in the original Byzantine generals problem, we assume that all network channels are reliable. One or more interfaces of a processor may be faulty. An interface faults if it receives a message (say 1) from it’s processor and sends a different message (say 0) through the channel, or loses the message and sends none. It also faults if it receives some message (or no message) from the channel and reports a different message to it’s processor. The processors are themselves reliable. A processor with one or more faulty interfaces is called a traitor. We categorize the types of faults that are possible in this model as follows: m/m’fault: The interface receives a message m (either from the channel or from the processor) and communicates a different message m’ to the other side. m/qb fault: The interface receives a message m (either from the channel or from the processor) and loses it. 4/m’fault: The interface generates a spurious message m’ without receiving any message. In this paper, we analyze the Byzantine agreement problem under these types of faults and present the following results. * k denotes the number of faulty processors. If all three types of faults are possible, then the agreement problem reduces to the classical Byzantine generals problem, and therefore at least 3k + 1 processors are required for agreement. This result is fairly straightforward. Curiously, if only m/m’ faults are possible, then the agreement problem becomes trivial. We present a protocol which achieves agreement in one round. This shows that the loss of messages, or the generation of spurious messages causes the main difficulty in agreement. 2Throughout this paper we assume that interprocessor communication is synchronous. All protocols presented in this paper are synchronous.

Letters 65 (1998) 125-129

If only rn~mtand m/q5 faults are possible, then agreement is possible irrespective of the ratio of processors having faulty interfaces with the total number of processors. We present a protocol for this model which achieves agreement in k + 1 rounds. The practicality of this model lies in the fact that often network interfaces fault only when they are sensitized, and therefore may not generate messages on their own. The paper is organized as follows. In Section 2 we observe that the problem of agreement under all three types of interface faults reduces to the classical Byzantine generals problem. Section 3 shows that in the absence of m/4 and 4/m’ faults, the agreement problem becomes trivial. In Section 4 we study the problem of agreement under m/m’ and m/r& faults and present an agreement protocol for the model. l

2. Agreement

under m/m', m/4 and 4/m faults

It is fairly easy to see that if m/m’, m/4 and 4/m’ faults are all possible then the agreement problem becomes equivalent to the original Byzantine generals problem. Let us examine the ways in which a processor may fault in the original Byzantine problem, and observe possible equivalent situations in our model: (1) A traitor receives a message and communicates some other message. A similar situation can occur in our model if, say, the interface used by the traitor while communicating the message has a m/m’ fault. (2) A traitor receives a message and communicates nothing. This may happen if the interface through which the traitor intended to send has a m /4 fault. (3) A traitor receives no message in a round, but communicates a message to some other processor. This is possible if, say, an interface of the traitor has a 4/m’ fault. Through these observations it is easy to see that agreement under m/m’, m/4, and 4/m’ faults is as difficult as the original Byzantine agreement problem. The reverse, that all faults possible in our model is covered by the original model is easier to see, since in the original model, the processor itself can be faulty. We do not elaborate this any further, but conclude with the following theorem whose proof is now obvious.

?? Dasgupta/lnfortnation

Processing Letters 65 (1998) 125-129

127

Theorem 1. If m/m’, m/d and qS/m’ faults are all possible, then agreement is possible in k + 1 rounds among at least 3k f 1 processors, where k denotes the number of processors with one or more faulty interfaces.

are correct, the initiator attempts to send a message to every other processor. Since m/qb faults are ruled out, each of these other processors receive some message (could be a corrupted message) from the initiator, and decide to attack. 0

Proof. Follows from the equivalence with the original Byzantine agreement problem. Agreement can be reached in k + 1 rounds using the oral message protocol of Lamport et al. [ 51. 0

The above result shows that if the processors are themselves correct, then the main difficulty in achieving agreement is in the presence of m/4 and 4/m’ faults. It may be interesting to also observe that in the absence of qblm’ and rnlqb faults, the processors with faulty interfaces also reach the same consensus.

3. The importance

of m/4

and 4/m’

faults

In this section we observe that in our model, the agreement problem becomes trivial if we rule out m/4 and 4/m’ faults. The absence of these two types of faults implies that whenever a processor A attempts to send a message m to a processor B, the processor B is certain to receive some message m’, where m’ may be the same as m or may be a corrupted version of m in case a m/m’ fault has occurred in A’s or B’s interface. This feature allows us to develop the following protocol, which achieves agreement in one round. Protocol for m/m’-only model. 1. One general (the initiator) decides whether to attack or retreat. 1.1. If the decision is to retreat, the general remains silent. 1.2. If the decision is to attack, the general sends a message to all other generals. 2. If a general (other than the initiator) receives any message in the first round it decides to attack, otherwise it decides to retreat. Theorem 2. The protocol for the m/m’-only achieves agreement in one round.

model

Proof. Suppose the initiator decides to retreat. Since the processors (including the initiator) are correct, the initiator does not attempt to send any message to any other processor. Since 4/m’ problems are ruled out, none of the other processors receive any message from the initiator and therefore all of them decide to retreat. Now, let us assume that the initiator decides to attack. Since the processors (including the initiator)

4. Agreement

under m/m’ and m/c$ faults

In this section, we study the agreement problem under m/m’ and m/q5 faults, that is, we consider cases where 4/m’ faults are not possible. We feel that this model is worth investigating, since often network interfaces fault only when sensitized, that is, when an attempt is made to send messages through them. We present a protocol which achieves agreement in at most k + 1 rounds, where k denotes the number of processors with faulty interfaces. In our protocol, the decision to retreat is modeled by silence and the decision to attack is communicated by sending a message. Early stopping conditions are also incorporated. The protocol among n generals is recursively described by the following algorithm. Algorithm M(0, n). 1. One general (we call him the commander) communicates a message to every other general if it has decided to attack. Otherwise it remains silent. 2. Each of the other generals, Gi, act as follows. If Gi has already decided, then it ignores all messages. If Gi has not yet decided, then it decides to attack if it receives any message from the commander, and decides to retreat otherwise. Algorithm M(k, n), k > 0. 1. One general (we call him the commander) communicates a message to every other general if it has decided to attack. Otherwise it remains silent. 2. Each of the other generals, Gi, act as follows. If G; has already decided, then it ignores all messages. If

128

P Dasgupta/lnfortnation

Processing Letters 65 (1998) 125-129

Gi has not yet decided, then it decides to attack if it receives any message from the commander, and remains undecided otherwise. General G; now acts as the commander in Algorithm M( k - 1, II - 1) among the other n - 2 generals. As in Lamport’s algorithm [5], the protocol starts when the initiator takes a decision on whether to attack or retreat, and initiates the protocol by acting as the commander in Algorithm M( k, n). The following results establish that in the presence of m/m’ and m/q5 faults only, Algorithm M(k, n) achieves Byzantine agreement in a cluster of n among which at most k processors processors, have faulty interfaces. In other words, Byzantine agreement is possible in this model irrespective of the fraction of processors that have faulty interfaces. Lemma 3. If the initiator of Algorithm M( k, n) decides to retreat, then all other processors (including those having faulty intelfaces) agree to retreat in round k + 1. Proof. If the initiator decides to retreat, then it sends no message in the Algorithm M( k, n). Since 4/m’ faults are ruled out, none of the processors receive any message, and therefore send none. As a result, in round k + 1 (when M( 0, n - k) is executed), all processors (including the ones with faulty interfaces) decide to retreat. q

Lemma 5. If the first processor with all correct integaces to reach a decision to attack reaches this decision in the round j, where j 6 k, then by the end of round j + 1, all processors with correct interfaces agree to attack, Proof. As soon as a processor with correct interfaces receives a message m, it decides to attack, and in the next round it communicates messages to each processor which is not in the sender-set of m. If P is the first processor with correct interfaces to receive a message m (and decide to attack), then obviously none of the processors in the sender-set of m have all correct interfaces. Therefore, in the next round P sends messages to all processors with correct interfaces, and each of them decide to attack. 0 Lemma 6. Ifno processor with all correct interfaces reach a decision to attack by round k, then each processor with all correct inte$aces will decide to retreat inroundk+ 1. Proof. We will show that if none of the processors with all correct interfaces receive a message (and decide to attack) by round k, then none of them can receive a message in round k + 1, and therefore all of them decide to retreat. The sender-set of a message received in round k + 1 has k + 1 processors, at least one of which must have all correct interfaces. That processor must have received a message by round k. This is a contradiction since we are given that none of the processors with all correct interfaces have received a message by round k. 0

In the proposed algorithm, except for the initiator, a processor sends out messages only if it receives a message in the previous round. Thus except for the messages sent out by the initiator, each message sent out by a processor is causally preceded by the receipt of some message by that processor.

Theorem 7. If only m/m’ and m/4 faults are possible, then it is possible to reach Byzantine agreement in a cluster of n processors of which at most k are faulty, irrespective of the ratio of k and n. Agreement can be reached in at most k + 1 rounds.

Definition 4 (Causal precedence). If a processor sends out a message m’ on receiving a message m, then we say that ml is causally preceded by m, and denote the relation by m + m’. We further say that the causal precedence is transitive, and call all messages which causally precede message m as ancestors of m. We call the set of processors constituting the sender of 17~and the senders of all it’s ancestors the sender-set of m.

Proof. We will show that Algorithm M( k, n) achieves this agreement. If n - k < 1, the proof is obvious. Otherwise, let us first consider the cases where the initiator decides to retreat. Then by Lemma 3, all processors agree to retreat in round k + 1. Now let us consider the cases where the initiator decides to attack. Two cases are possible depending on whether the interfaces of the initiator are all correct or not. We treat each of these cases separately.

P. Dasgul,ta/lnfbnnation

Processing

Case 1: Initiator is correct. If all the interfaces of the initiator are correct and the initiator decides to attack, then it successfully sends a message to all other processors in the first round. As a result all processors with correct interfaces receive the message and decide to attack. Thus all loyal generals agree to attack in the first round. Case 2: Initiator has faulty interfaces. If the initiator has one or more faulty interfaces and the initiator decides to attack, then it may succeed in sending messages to some and none to others. In this case we need to prove that by the end of the last round (that is, round k -t 1), processors with all correct interfaces reach a common decision. By Lemma 5, if a processor with all correct interfaces receives a message by round j (j < k), then by the end of round j + 1, processors with all correct interfaces reach a common decision to attack. On the other hand, by Lemma 6, if no processor with all correct interfaces receive any message by round k, then processors with all correct interfaces reach a common decision to retreat. There-

Letters 65 (1998) 125-129

129

fore, even if the initiator has one or more faulty interfaces, processors with all correct interfaces reach a common decision. 0

References 111 D. Dolev, The Byzantine generals strike again, .I. Algorithms 3 (1982)

14-30.

121 D. Dolev et al., An efficient

algorithm for Byzantine agreement without authentication, Inform. and Control 3 (1983) 257-274. algorithms for L31 D. Dolev, H.R. Strong, Authenticated Byzantine agreement, SIAM J. Comput. 12 (4) ( 1983) 656666. 141 D. Dolev, R. Reischuk, H.R. Strong, Early stopping in Byzantine agreement, J. ACM 37 (4) (1990) 720-741. [51 L. Lamport, R. Shostak, M. Pease, The Byzantine generals problem, ACM Trans. Programming Languages Systems 4 (30 ( 1982) 382-401. [61 L. Lamport, The weak Byzantine generals problem, J. ACM 30 (4) (1983) 668-676. [71 M. Pease, R. Shostak, L. Lamport, Reaching agreement in the presence of faults, J. ACM (April 1980).