Information Processing North-Holland
RING BASED TERMINATION COMPUTATIONS S. HALDAR Department
26 October
Letters 29 (1988) 149-153
DETECTION
ALGORITHM
1988
FOR DISTRIBUTED
and D.K. SUBRAMANIAN
of Computer
Science and Automation,
Indian Instrtute of Science, Bangalore 560 012, India
Communicated by W.L. Van der Poe1 Received 6 November 1987
Keywords:
Distributed
program,
distributed
termination,
1. Introduction A distributed program P should terminate soon after performing the task for which it was written. For a sequential program, termination is a trivial issue; but, for a distributed program, an additional measure is needed to ensure this condition. Many algorithms have already appeared in the literature [1,2,4,5,6] in recent years. The majority of them (like [4,6]) assign the responsibility to a unique centralised process in the system. This causes performance bottlenecks in the system. A few algorithms belong to the distributed category [1,2,5]. In this paper we present a distributed and fully symmetric algorithm, where all processes follow an identical protocol. This is an improved algorithm similar to the ones described in [2,5]. After introducing the termination detection problem in Section 2, an improved algorithm is developed with correctness argument in Section 3. Section 4 concludes with comments on performance aspects of the algorithm presented.
process
communication,
ring
predicate B,, 0 G i < n. Let B be the conjunction of local predicates B,, where the value of the B,‘s corresponds to the same time instant of all the processes. B is the Global Termination Condition (GTC) of P. A program P can only terminate when all processes of program P satisfy their local predicates simultaneously, i.e., when the GTC is satisfied.
3. Algorithm for distributed termination The processes of a distributed program P are assumed to be connected by a Hamiltonian ring as in [1,2,5] (see Fig. l), where the control communication to detect the termination takes place in single direction, say anti-clockwise. Each process knows its successor in the ring. When the local condition Bi of a process p, is true, process p, is said to be in passive state, and in active state otherwise. Apart from control communication, basic communication may take place
. .
‘i+l
2. Distributed termination problem Let a distributed program P consist of n communicating sequential processes pO, pi, . . . , pn_ 1. The processes communicate among themselves by exchanging messages. Each process p, has a local 0020-0190/88/$3.50
0 1988, Elsevier Science Publishers
B.V. (North-Holland)
pi
. . .P
pi -1
Fig. 1.
149
INFORMATION
Volume 29, Number 3
PROCESSING
among the processes when they are active. A passive process never initiates a basic communication; but a process in any state can initiate or engage in control communication. Distributed termination consists of two phases: the termination detection phase for detecting the global terminating condition, followed by termination of each process. We shall now explain some terminology which will be used in the ensuing discussion. (a) The control section of each process contains three boolean variables PASV, RCTM, and SNDTM as in [1,2] with initial value false. PASV becomes true when a process changes from active to passive state. SNDTM would be set true upon sending a termination message to its successor. RCTM would be set true upon receiving a termination message from its predecessor. (b) An active process stores the identification of all those processes (a subset of { pO, . . . , pn- 1}) with which it enters into communication as in [2]. (c) A process pi also keeps a process identification variable whose value is the farthest process [3] down the ring with which p, has communicated when pi was active.
Define
sequence
<:
to be a relation among processes. that pk is ahead of pi in the ring. If pi communicates with all other processes, then
Pi < : pk means
farthest(
pi) :=p((i
If pi communicates {p((i+j)
mod n),
p((i+m>
+ n - 1) mod n). with a subset of processes, p((i+k)
SNDTM Current
nos.
= false
sequence
nos.
Pk (b)
(4
FARTHEST = ... PASV = true RCTM = false SNDTM = false
FARTHEST = ... PASV = true RCTM = false SNDTM = fufse sequence
no.
Cc)
(4 Fig. 2.
150
mod n),...,
mod n)},
P,
Current
say
wherej
RCTM = false SNDTM = false Current
26 October 1988
LETTERS
J
INFORMATIONPROCESSINGLETTERS
Volume 29,Number 3 3.1. Termination
phase
PI? upon becoming passive, issues a control message for detecting the termination condition. Let this message reach pi without any negative response (i.e., with KEY = 0), at that time the configuration of the control section of p/- will be one of those shown in Fig. 2. p, will generate a negative response when its control section status is either (a) or (b) of Fig. 2, and a positive response if its status is either (c) or (d) of Fig. 2 as in [2]. When a process receives a termination detection message, it does the following before forwarding the message to its successor. (a) As soon as detection message gets falsified,
(1) Upon p, becoming
26 October1988
the bit flag KEY is set to 1 and remains 1 until the message is purged [5]. (b) The unique identification carried with the detection message is removed from the control section of the visited process, if present. (c) If KEY becomes 1 before the control message generated by p, reaches FARTHEST( p, ), the message is purged by FARTHEST( p,). (d) As soon as the control message generated by a process p, crosses FARTHEST( p,), the message is purged if it is received by an active process or by a process having a nonempty control section. The algorithm (for p,) of the termination detection phase is fully described below.
passive :
begin
PASV( p,) := true; KEy:=o;
send DM( p,, FARTHEST( p,), KEY) to succ( p,): ( * DM stands for Detection Message *) ( * succ( p,) denotes the successor of p, on the ring *) end; (2) Upon receiving
a message from some predecessor
DM( p,, FARTHEST( p, ), KEY):
begin if KEY = 1 then
either some processes are active or their control identifiers is not empty *)
(*
list of process
begin if p, = p, then begin KEY :=
0;
fonwd DM( P,, P(, +1j
mod
n,
KEY) to SW P, >;
end else if p, -C: p, < : FARTHEST( p,) then begin if p, E ID( p,) then remove p, from ID( p,); (* ID denotes the set of identification stored in p, * )
forward DM( p,, FARTHEST( p, ), KEY) end else begin if p, E ID( p, ) then remove p, from ID( p, ); if p, is active or not(empty(ID( pi))) then
purge DM( p,, FARTHEST( p,), KEY) else forward DM( p,, FARTHEST( p,), KEY) to succ( p,) end 151
INFORMATION
Volume 29,Number 3
26October1988
PROCESSINGLETTERS
end then enter termination phase ( * DM has returned back to initiator *) else if p, = FARTHEST(p,) then begin if p, E ID( p,) then remove p, from ID( p,); if p, is passive and empty(ID( p,)) then forward the message to succ( p,) else purge DM( p,, FARTHEST( p,), KEY);
else if p, =pj
end else if pj < : pi < : FARTHEST( p,) then begin ( * p, lies on the path p, to FARTHEST(p,) *) if pJ E ID( p,) then remove p, from ID( p,); if p, is active or not(empty(ID( p,))) then KEY := 1; forward message DM( p,, FARTHEST( p,), KEY) to succ( p, )
end else if p, is passive and empty(ID( p,)) then forward message DM( p,, FARTHEST( p,), I(EY) to succ( p,) else purge the message DM( p,, FARTHEST( p,), KEY) end;
3.2. Termination
3.3. Correctness
phase
This phase is similar to the one of Arora et al. [2]. So, here we only provide their algorithm.
(a) Upon determining the GTC by a process, say p,, do the following:
begin SNDTM := true; send termination message to succ( p,) end; (b) Upon receipt of termination
message by p,:
begin RCTM := true; then terminate else begin SNDTM := true; send termination message to SUCC(pi); terminate if SNDTM
end end: 152
of the algorithm
To establish the algorithm we have tions: (1) At least one Global Termination
correctness of the presented to prove the following asserprocess is able to detect the Condition when it is satisfied
(true). (2) No
probability
of detecting false termina-
tion. 3.3. I. Proof of assertion (1) Case 1. Let the process p, be the latest process to become passive. Also, suppose at that time that all other processes have already become passive and their control messages are not in transit in the ring. Hence, the control sections of other processes would at most contain the identification of p,. As per the algorithm, the detection message of p, would remain unfalsified (E(EY = 0) before it reaches back to p,. Eventually, p, would enter the termination phase. Case 2. Let process p, be the latest process to become passive. Also, suppose at that time that some of the detection messages issued by some
Volume 29, Number 3
INFORMATION
PROCESSING
LETTERS
26 October 1988
processes are in transit in the ring. Either all the messages or at least that sent by p, will return to their respective issuer. Let pk be the process which receives a message issued by p,. pk will set KEY to 1 (i.e., message gets falsified) before forwarding the message issued by p,, if its control section contains some identifications of processes whose control messages are in transit in the ring. Eventually, the message will reach p, (if p((i + n 1) mod n) is its farthest process or the control section beyond farthest process is empty) and p, will start the same protocol for the second time. As the messages are served FCFS, when the second message issued by p, reaches pk, pk’s control section will be empty. So, pk will forward the control message of p, with positive response. Finally, p, will enter the termination phase.
time-stamp and clock synchronization (as in [5]). Also, we did not use any sequence number as in [2]. Here, we allow the detection message issued by process pi to be purged by a FARTHEST( pi) process or an active process beyond that or by a process with nonempty control section beyond FARTHEST( p, ) (similar to [ 51). Here we have used the concept of a farthest process in the ring (like [3]), but we do not hold up any message by an active process (like [3]). Our solution is simple and it can untuitively be said that, in averages cases, it takes a smaller number of messages to detect the Global Termination Condition than that of [2]. At the extreme case, both methods use the same number of messages.
3.3.2. Proof of assertion (2) Detecting a false Global Termination Condition implies a process gets back its own unfalsified detection message when some processes are either active or its control section is not empty. But, this is impossible because when the message moves around the ring, it is either purged or falsified if it is met with some active processes or the processes having nonempty control section. A falsified message can never force a process to enter the termination phase.
References
4. Concluding remarks We have presented a fully distributed and symmetric algorithm for the distributed termination problem. It does not make use of the concept of
PI R.K. Arora, S.P. Rana and M.N. Gupta, Distributed termination detection algorithm for distributed computations, Inform. Process. Lerr. 22 (6) (1986) 311-314 (see also: R.K. Arora et al., Letter to the Editor, Inform. Process. Letr. 29 (1) (1988) 53-55). PI R.K. Arora, S.P. Rana and M.N. Gupta, Ring based detection algorithm for distributed computations, Microprocessing & Microprogramming 19 (3) (1987) 219-226. [31 R.K. Arora and N.K. Sharma, A methodology to solve distributed termination problem, Zform. .Systems 8 (1) (1983) 37-39. [41 E.W. Dijkstra and C.S. Scholten, Termination detection for diffusing computations, Inform. Process. Left. 11 (1) (1980) l-4. PI S.P. Rana, A distributed solution of the distributed termination problem, Inform. Process. Left. 17 (1) (1983) 43-46. Fl R.W. Topor, Termination detection for distributed computations, Inform. Process. Left. 18 (1) (1984) 33-36.
153