J. SYSTEMS SOFTWARE 1993; 21:103-112
103
A Quorum-Based Algorithm for Parameterized Semaphore Operations Mitchell
L. Neilsen
and Masaaki
Mizuno
Department of Computing and Information Sciences, Kansas State University, Manhattan, Kansas
In
a centralized
commonly complex
problems,
operations the
operating
have
semaphore
rameterized lems,
operations
such
solved. and
as
Many
their
efficient
semaphore
1.
/writer have
operations
of
prioritized
been
may
exclusion proposed.
is as powerful
operations. algorithm
allows
to
pa-
prob-
processes be easily algorithms However,
as parameter-
In this article, implement
in a distributed
with
Using
complex
problems, mutual
algorithms
ized semaphore an
coordination
extensions
none of these
parameters.
operations,
distributed
of semaphore
One extension
to be parameterized
/increment
reader
are
To solve more
extensions
proposed.
semaphore
various
semaphores
processes.
several
been
test and decrement
and
system,
used to coordinate
we present
parameterized system.
INTRODUCTION
Coordination of processes is an important issue in computer systems. In a centralized operating system, semaphores can be used to directly solve process coordination problems such as mutual exclusion and management of reusable and consumable resources. To easily control more complex process coordination, several extensions to the basic semaphore operations have been proposed. These extensions were presented in a review by Presser [ll. The most powerful extensions presented are parameterized semaphore operations. Using parameterized semaphore operations, complex problems such as the coordination of prioritized processes and reader/writer problems can be solved easily. The distributed system consists of a network of N nodes which communicate by exchanging messages. Many distributed mutual exclusion algorithms have been proposed for such a system [2, 31. Since these
Address correspondence to Professor Masaki Mizuno, Dept. of Computing and Information Sciences, Kansas State University 234 Nichols Hall, Manhattan, KS 66506.
0 Elsevier Science Publishing Co., Inc. 655 Avenue of the Americas, New York, NY 10010
algorithms only control mutually exclusive access to a single resource, they are not as powerful as semaphore operations. For example, general reusable or consumable resource management problems cannot be solved by direct application of such mutual exclusion algorithms. As an extension, several distributed multiple mutual exclusion algorithms have been proposed [4-61. In multiple mutual exclusion, a process accesses a set of resources instead of a single resource. Thus, several different processes may be able to enter their critical sections at the same time. Unlike a sequence of requests for (separate) single mutual exclusion, these algorithms prevent deadlock. All of the above distributed primitives force processes to behave in a strict order: processes must first request resources (P operation) and then release them (V operation). Thus, they cannot be directly applied to solve a consumable resource management problem in which producers issue only V operations and consumers issue only P operations. Maddi and Raynal [7] recently proposed a distributed implementation of counting semaphore operations. However, more complex process coordination problems, such as reader/writer problems, cannot be easily solved by using only counting semaphores. In this article, we present a distributed implementation of parameterized semaphore operations. The operations are as powerful as centralized parameterized semaphore operations. The article is organized as follows. Section 2 presents an overview. First, the definitions of parameterized semaphore operations are reviewed. As an example, parmeterized semaphore operations are used in a solution to the reader/ writer problem with writer preference. Then, and overview of our distributed algorithm for parameterized semaphore operations is presented. Sec-
0164-1212/93/$6.00
104
M. Neilsen and M. Mizuno
J. SYSTEMS SOFTWARE 1993; 21:103-112
tion 3 describes the algorithm in more detail. Section 4 presents a complete example to show how the algorithm works. Finally, performance and fault tolerance of the algorithm are discussed in section 5.
1. binary and counting semaphore operations 2. multiple mutual exclusion primitives 3. k-out-of-m synchronization primitives
2. OVERVIEW In this section, we present an overview of a distributed algorithm for parameterized semaphores. We start by reviewing the definitions of parameterized semaphores proposed by Presser [l].
2.1 Parameterized
be solved using any starvation-free synchronization primitives, such as sequencers and eventcounts [8]. Parameterized semaphore operations are upper level compatible with all of the following:
The real strength of parameterized semaphore operations is that they allow the specification of a test value and an increment/decrement value for each of the semaphores within a primitive. Therefore, the test value can be different from the decrement value.
Semaphores
E, be a set of semaphores (event variLet El,..., ables). Each semaphore Ei is a data structure consisting of two parts: a resource counter (a nonnegative integer) E,.count and a queue of waiting process identifiers E,.queue. The parameterized semaphore operations PP and PV are defined as follows:
PP(E,,, &,I>de,;. . . ; E,,, G_~, &j.. . ; Eek, tek, d-J; if E,,.count r t, for 1 I J < k I then for j := 1 to k do E,,.count else
:= E,,.count
- d, J
Place the calling process in the waiting queue of the first EC, such that E,,.count < t, and set its program counter to {he beginning of the PP operation so that all conditions will be reexamined when the process is reactivated by a PV operation. PVC&.,, i,,;. . .; E, , i,,; . . .; I&, i,,); for j := 1 to k do begin
E,,.count := E,,.count + i,+ Remove waiting processes in E,,.queue place them into the ready queue.
and
end
The process that has requested the PV operation continues executing or passes control to the CPU scheduler after placing itself into the ready queue. We assume that tej 2 d,, r 0 and i,, 2 0 for 1 I j 5 k. Furthermore, both operations are executed in mutual exclusion. Note that the PV operation wakes up all of the processes whose conditions may be satisfied; then they compete again. Thus, the above implementation does not guarantee starvation freedom. This is not a fault of the implementation, but rather a necessary feature of the operations in order to solve problems which are inherently starvation prone, such as various reader/writer problems. In particular, such starvation-prone problems cannot
2.2 Example
As an example, consider the reader/writer problem with writer preference. Suppose that there are R readers and W writers sharing a single data object. Any number of readers may have simultaneous access to the data, but only one writer at a time may access the data (readers and writers may not have simultaneous access to the data). There is no preemption, and a reader may begin reading only when no writer is writing or waiting; that is, when an active writer completes, waiting writers have priority over waiting readers. The above problem is fairly complex to solve using only binary and counting semaphores [93. However, it is easily solved using parameterized semaphore operations, as follows [8]: var 25,: semaphore;
( * initialize to W, the number of writers *) E,: semaphore; (* initialize to R, the number of readers *) E,: semaphore; ( * initialize to 1 * 1
Reader loop
Writer loop
rl: PP(E,, W,O; E,, 1,l); wl: PP(E,, 1,l); w2: PP(E,, R,O; E,, 1, 1); perform read perform write r2: PV(E,, 1); w3: PV(E,, 1; E,, 1); endloop endloop
The purpose of line wl is simply to decrement E,.count to indicate that one more writer is waiting or writing. On line w2, a check is made to see if no other writer is writing and no reader is reading. If so, E,.count is decremented to indicate that this writer is writing. Counter E,.count is not affected by the operation. On line rl, a check is made to see if no writers are currently writing or waiting. If so,
Parameterized
E,.count is decremented to indicate that one more reader is reading. The above example demonstrates the power of parameterized semaphore operations. Presser [ 11 gives solutions to other complex process coordination problems such as prioritized process coordination and reader/writer problems with both strong and weak reader preference. Next, we present an overview of our distributed implementation of parameterized semaphore operations. 2.3 Distributed
J. SYSTEMS SOFTWARE 1993; 21:103-112
Semaphore Operations
Implementation
Semaphores are maintained in the system in a distributed manner. All semaphores are included in single data object; a copy of the data object is stored at each node. Any replica control protocol can be used to control access to the copies to ensure that the different copies of the data object appear as a single, nonreplicated copy, that is, one-copy equivalence [lo]. However, in both parameterized semaphore operations PP and PV, it is possible that the semaphores will need to be updated; thus, any two semaphore operations may conflict. Therefore, in addition to ensuring one-copy equivalence, the execution of semaphore operations must be totally ordered to guarantee the consistency of the semaphore values [lo]. Any distributed mutual exclusion algorithm can be used to totally order semaphore operations. In this implementation, a quorum-based mutual exclusion algorithm is simply modified to control access to the semaphores. A quorum is a set of nodes. A set of quorums is called a coterie if any two quorums in the set have a node in common and are not properly contained in each other [ll]. A mutual exclusion protocol based on a coterie is modified to incorporate replica control as follows. Each copy of the semaphore values includes a version number. The version number is used to determine which copies of the semaphore values are the most up to date. When a node wants to issue a PP or PV operation, it sends request messages to all of the nodes in a quorum of the coterie. Each member of the quorum may return permission to at most one node at a time. In addition to permission, the semaphore values and version number are also returned. If a node receives permission from all of the nodes in a quorum, the node may read and update the semaphore values in mutual exclusion. A copy of the semaphore values with the highest version number is the most up to date. At the end of the semaphore operation, the new semaphore values and the new version number, which is one greater
105
than the highest version number encountered, are returned to all of the nodes in the quorum. When a node in the quorum receives the new values and version number, it updates its own copy. Now, the node may send permission to another node. Because of the intersection property between quorums, mutual exclusion and one-copy equivalence are guaranteed. 2.4 Other Possible Implementations In this section, we will discuss other possible implementations and explain why they are not as efficient. The first design consideration was how to maintain copies of the set of semaphores in the system. There are two different approaches: 1. Each semaphore is maintained as a separate data object. 2. The set of all semaphores is maintained as a single data object; that is, there is only one data object in the system. If the first approach is taken, concurrent executions within PP and PV operations may be possible. However, the management of waiting requests becomes more complex. Considering the increase in complexity caused by managing waiting requests and the fact that PP and PV operations are usually very short compared with critical sections, we chose the second approach. Furthermore, in the second approach, a node executing a PV operation can easily determine which blocked nodes can proceed because the node has access to all current semaphore values. The next design consideration was how to store blocked requests. There are two approaches: 1. A queue of blocked requests is associated with each semaphore. This is how the blocked requests are stored in the centralized implementation. 2. A single queue is used to store all of the blocked requests. Even though the first approach may save some computation time in a PV operation, for simplicity we chose the second approach. 3. ALGORITHM
In this section we present the details for our implementation of parameterized semaphore operations. 3.1 Data Structures We assume that l
the semaphores
E consist of:
an array of integers, E.count, denoting the semaphore values
106
J. SYSTEMS SOFTWARE 1993; 21:103-112
l
an integer number
variable,
l
a set, E.blocked, been blocked.
E.age, denoting
M. Neilsen denoting
the version
requests
which
have
Each node maintains a copy of E. Let t, d, and i be arrays which store values to be tested against the semaphore values, values to be decremented from the semaphore values, and values by which the semaphores are to be incremented, respectively. That is, comparing with the notation in section 2, E.count[jl = Ej.count, t[j] = tj,d[j] = dj, and i[j] = ij for 1 I j I m. A PP operation uses t and d and a PV operation uses i as parameters. Each blocked request stored in E.blocked is denoted by a pair (I, 0, where I is the identifier of the node which issued the request and t is the array of test values of the request. In this way, requests in E.blocked can be tested against the semaphore values in E.count. Maekawa [3] developed a quorum-based protocol for mutual exclusion that incorporates deadlock prevention. In Maekawa’s algorithm, quorums are constructed based on finite projective planes. Our parameterized semaphore implementation is based on Maekawa’s algorithm. However, quorums do not need to be based on finite projective planes. They can be constructed using various other methods, including weighted voting [ 121, finite projective planes [31, grid protocols 131, and tree protocols [2, 131. Six types of messages are exchanged among the nodes: REQUEST, GRANT, RELEASE, INQUIRE, YIELD, and PROCEED. All messages contain timestamps. A timestamp attached to a message is larger than the timestamp of any message observed by the node initiating the message. Any set of timestamps can be totally ordered. Lamport’s algorithm can be used to generate such timestamps, in which node identifiers are used for conflict resolution [14]. In addition to a copy of the semaphore variable E, each node maintains two other data structures, QUEUE and GRANTED, which are used to store request. Requests are stored in QUEUE in timestamp order. A request which has been granted permission in stored in GRANTED.
and M. Mizuno
Then the node returns a GRANT message, containing a copy of E, to the requesting node at the front of QUEUE. To ensure mutual exclusion, a node may send only one GRANT message at a time. When the requesting node receives GRANT messages from all of the nodes in the quorum, it determines which copy of E is ‘the most up to date by comparing E.age. In the following description, E denotes the most up-to-date copy. Depending on the operations, E is used as follows: 1. If the operation is PP(t,d), the node compares Ecount with t, componentwise. If E.count 2 t, then d is subtracted from E.count componentwise; otherwise, the pair (I, 0, denoting the request, is added to the set E.blocked. 2. If the operation is PV(i), the node adds i, to E.count, componentwise. If there is a request (J, t) in E.blocked such that E.count 2 t, a PROCEED message is sent to node J. After node J completes its PP operation, it checks to see if another blocked node can complete its PP operation. If such a node exists, node J sends a PROCEED message to that node. In this way, nodes that are able to complete their PP operations are allowed to proceed, one at a time. Finally, the node sends RELEASE messages, which contain the new values of E, to all of the nodes in the quorum. When a node receives a RELEASE message, it updates its copy of E. Then, if other requests are waiting in QUEUE, the node sends a new GRANT message to the requesting node at the front of QUEUE. The INQUIRE and YIELD messages are used to prevent deadlock. A detailed algorithm is presented in the Appendix.
4. EXAMPLE In this section, a solution to the reader/writer problem with writer preference (see Section 21, is presented. Suppose that there are three writer processes, wl, w2, and w3, and two reader processes, rl and r2. Thus, W = 3 and R = 2. Assume that the nodes are assigned the following quorums:
3.2 Algorithm When a node issues a PP or PV operation, it sends REQUEST messages to all of the nodes in a quorum. When a node in the quorum receives a REQUEST message, it places the request in QUEUE.
e,, = Iwl, w2, a Q,
= Iw2, ~3, rI1 Qw3 = (~1, ~3, rI1
Q,,= hl,r21
Qr2 = M,w31
Parameterized
Semaphore
A copy of the semaphore to E.age = 0 E.count[l] = 3 E.count[2] = 2
at each node is initialized
E.count[31 E.blocked
at each node are initial-
GRANTED.seq_-no = 0 GRANTED.INQ_sent = false GRANTED.node = 0 QUEUE = { } The parameterized following parameters:
semaphore
Reader loop rl: PP(E,, 3,0; E,, 1,l); perform read r2: PV( E,, 1); endloop Writer loop wl: PP(E,, 1,l); w2: PP(E,, 2,0; E,, 1,l); perform write w3: PV(E,, 1; E,, 1); endloop
operations
d = [l, O,O] from E.count. E is (1, [2,2,11,( 1). 4. Writer wl sends and r2.
= 1 = ( ]
We will denote the semaphore values by a triple E.count[21, E.count[311, (E.age, [E.count[l], E.blocked). The initial value of E is denoted by (0, [3,2,11, { 1). The other data structures ized to:
107
J. SYSTEMS SOFTWARE 1993; 21:103-l 12
Operations
use the
Parameters t = [3,1,0], d = [O,1,01 i = [O,1,Ol
Parameters t = [l,O,O],d = [l,O,O] t = [0,2,1], d = [O, 0, l] i = [l,O, 11
First, we will show in detail how the protocol works by considering the invocation of a single operation. Suppose that writer wl performs a PP(E,, 1,l) operation: Writer wl sends REQUESl71, wl) messages to wl, w2 and r2, with sequence number 1 and node identifier wl. Actually, writer wl only pretends to send (receive) messages to (from) writer wl. Since no other requests have been granted, wl, w2 and r2 set: l GRANTED.seq_no = 1 l GRANTED.node = wl l GRANTED.INQ_sent = false Then, GRANT messages, with E = (0, [3,2,1], ( }), are sent to wl. Writer wl receives the GRANT messages and compares Ecount = [3,2,1] with t = [l, 0, 01. Since E.count 2 t, writer wl updates E by incrementing E.age (E.age = 1) and by subtracting
Thus, the new value of
RELEASE
messages
to wl, w2,
5. Upon receipt of the RELEASE messages, wl, w2, and r2 update their copies of E to the new value (1, [2,2,11, I ]>. Next, we will show how the protocol implements the reader/writer problem with writer preference correctly. Assume that after successfully completing statement wl, writer wl executes statement w2: PP(E,, 2,0; E,, 1, 1). Writer wl sends REQUEST messages to wl, w2, and r2. All nodes reply with GRANT messages containing E = (1, [2,2,11, ( ]I. Writer wl updates the value of E to (2, [2,2,0], ( ]) and sends RELEASE messages with the updated value to wl, w2, and r2. Suppose that reader PP(E,, 3,0; E,, 1, 1).
rl
executes
statement
rl:
Reader rl sends REQUEST messages to rl and r2. Readers rl and r2 reply with the semaphore values E, = (0, [3,2,11, I ]) and E, = (2, [2,2,0], ( ]>, respectively. Since E,.age < E*.age, rl selects E, as the most up-to-date value of E. Since E.count $ t, where t = [3, 1, 01, rl is blocked, so reader rl sends the updated value of E = (3, [2,2,01, ((rl, [3,1,0])}) to rl and r2 in RELEASE messages and waits for a PROCEED message. Now, suppose PIYE,, 1,1X
that writer w2 executes
statement
wl:
1. Writer w2 sends REQUEST messages to w2, w3, and rl. 2. The semaphore values returned by reader rl contain the most up-to-date values of E. Writer w2 updates E and sends the new value (4,[1,2,0], ((r1[3,1,0]))) to w2, w3, and rl. Writer
w2 executes
statement
w2: PP(E,,
2,0; E,,
1, 1). 1. Writer w2 receives semaphore values (4, [l, 2,0], ((rl, [3,1,0]>]) from w2, w3, and rl. 2. Since E.count $ t, where t = [O,2,1], w2 is blocked. So, writer w2 sends the updated values of E = (5, D, 2,01, Krl, [3,1,01>, (~2, [O, 2,11>]) to w2, w3, and rl in RELEASE messages and waits for a PROCEED message.
10s
J. SYSTEMS SOFIWARE 1993; 21:103-112
Finally, writer wl finishes writing ment w3: PVC&, 1; E,, 1).
M. Neilsen and executes
state-
Writer wl sends REQUEST messages to wl, w2, and r2. Writer w2 replies with values (5, [l, 2,0], {(rl, [3, 1,011, (w2, [O, 2, ll)]). Writer wl updates E.count to [2,2,1]. Since reader rl cannot proceed and writer w2 can now proceed, writer wl sends a PROCEED message to writer w2 and waits for a PROCEED message to be returned. Note that after writer wl finishes writing, writer w2 proceeds, instead of reader rl, implementing writer preference correctly.
5. PROPERTIES
and M. Mizuno
log(N) in the best case, and num = O(log(N)). If the condition for a PP operation is not satisfied initially at a node, the node must wait to receive one PROCEED message. Thus, in the worst case, num + 1 messages are required to complete the PP operation. In a PV operation, if the node generated a PROCEED message (note that this PROCEED message is counted in the PP operation), it must wait for one PROCEED message to be returned. Thus, in the worst case, num + 1 messages are required to perform the PV operation. 5.3 Fault Tolerance Several modifications can be made to make the algorithm more fault tolerant. There is a tradeoff between fault tolerance and the number of messages required per operation. In this section, we describe some possible modifications that can be used to make the algorithm more fault tolerant.
5.1 Correctness correctness of the algorithm follows directly from the correctness of the underlying protocolsthe replica control protocol and the mutual exclusion protocol. Complete correctness proofs of the underlying protocols can be found elsewhere [3, 10, 151. Recall that starvation freedom is not guaranteed by the centralized implementation. This is also true of our implementation. As noted in section 2.1, this is a necessary feature of the operations in order to solve problems that are inherently starvation prone, such as various reader/writer problems.
The
5.2 Performance In the algorithm, six types of messages are exchanged: 1) REQUEST messages, 2) GRANT messages, 3) RELEASE messages, 4) INQUIRE messages, 5) YIELD messages, and 6) PROCEED messages. The number of messages of types 11-51, required to perform the PP and PV operations, denoted num, depends on the size of the quorums used. For simplicity, we assume that all of the quorums are of the same size, say q. In the best case, only q REQUEST messages, q GRANT message, and q RELEASE messages are exchanged for each PP and PV operation. If there is a potential of deadlock, at most q INQUIRE, q YIELD, and q new GRANT messages are exchanged. Thus, 3 * q I num I 6 * q. For instance, if quorums based on finite projective planes are used, q is approximately fi. So, 3m _< num s 6fi [3]. On the other hand, if quorums are constructed using Agrawal and El Abbadi’s [2] binary tree protocol, q is approximately
5.3.1 Dynamic quorum adjustment. In distributed systems, nodes and/or communication lines may fail. When a node tries to obtain permission from all nodes in a quorum, it may not be able to communicate with some of them because of node and/or communication line failures. Since any quorum can be used to obtain permission, the node can use other quorums. However, it is possible that a node is not able to communicate with all of the nodes in any quorum. Then that node will be blocked until enough nodes and communication lines recover. If quorums are dynamically adjusted based on which nodes can communicate with each other, then subsequent operations are more likely to be successful. However, quorums must be carefully modified so that any two quorums always intersect. Several such methods have been proposed [16, 171. These methods can be directly applied to our algorithm to improve the system availability. 5.3.2 PROCEED messages. Suppose that a node fails when it receives a PROCEED message and is executing a PP operation. Then the node which initiated the PROCEED message will never receive a PROCEED message back to complete the PV operation. A timeout mechanism can be used to detect such an error. However, recovery from the error is not easy. Some nodes may have received PROCEED messages, completed their PP operations, and progressed further when the failure occurs. Thus, the node that initiated the PROCEED message cannot simply reissue the PROCEED message again with the old value of E. In this section, we describe another approach that
Parameterized
Semaphore
J. SYSTEMS SOFIWARE 1993; 21:103-112
Operations
eliminates the need for PROCEED messages. When a node executing a PV operation finds blocked requests that can proceed, it sends TRY-AGAIN messages to the nodes that issued the requests. When a node receives a TRY-AGAIN message, it executes the PP operation again from the beginning. This approach is similar to the centralized implementation of parameterized semaphore operations; that is, all blocked nodes that can potentially proceed will compete again. Modifications needed in the algorithm (see Appendix) are as follows. The following statement is added to the then clause of Statement [A] (if statement): E.blocked
:= E.blocked
Statement [Bl (if statement) lowing if statement:
This article presented a quorum-based distributed algorithm to implement parameterized semaphore operations. Using the algorithm, complex problems such as coordination of prioritized processes and reader/writer problems with various preferences can be easily solved in a distributed environment. The number of messages required to perform each of the PP and PV operations is only one more than the number of messages required by the underlying quorum-based mutual exclusion algorithm.
ACKNOWLEDGMENTS
tion grant
in part by National
is replaced by the fol-
for each J such that (J, t> in E.blocked (E.countb] 2 tb] for all j) do send a TRY-AGAIN message to node J;
and
There are six procedures at each node-PP, PV, REQ, REL, INQ, and YLD. Procedures REQ, REL, INQ, and YLD are executed to handle REQUEST, RELEASE, INQUIRE, and YIELD messages, respectively. Each node executes the procedures in local mutual exclusion. A node does not have to execute in local mutual exclusion while waiting for a message to arrive. The only exception is that a node must execute in local mutual exclusion while waiting for a PROCEED message to be returned in procedure PV.
const
I = node identifier (positive integer); m = number of semaphores (positive integer); Q, = quorum for node I (set of positive integers); var
num _ received : integer; / * number of GRANT messages received * / seq_no : integer; /* timestamp */ BLOCKED / * the node is blocked in a PP operation * / : boolean; RECEIVED : boolean; / * all GRANT messages have been received * / E: record of age: integer; count: array[l . . . m] of integer; blocked: set of (J, t) such that J: integer and t: array [l..m] of integer;
seq_no node INQ_sent
record of : integer; : integer; : boolean;
Founda-
APPENDIX
message; seq_no := max(seq_no,seq_no_J); PP(t, d); end
end GRANTED:
Science
CC!?-8822378.
is replaced by the fol-
if (BLOCKED) then begin wait for a TRY-AGAIN
Statement [Cl (if statement) lowing for statement:
6. CONCLUSION
This work was supported
- {(I, t)) ;
109
/ / / /
* * * *
stores request which has been granted by this node * / sequence number of granted request * / node identifier of granted request * / true if INQUIRE message has been sent * /
end QUEUE: queue of (seq_no_J, J) such that seq_no_J, procedure PP(t, d: array[l..m] of integer);
J: integer;
110
J. SYSTEMS SOFTWARE 1993: 21:103-112
M. Neilsen
var j: integer; begin seq_no := seq_no + 1; RECEIVED := false; num_received := 0; send REQUEST(seq_no, I) messages to all nodes in (2,; while num_received < IQ,/ do begin wait for a GRANT(seq_no_J, W) message; num_received := num_received + 1; seq_no := max(seq_no, seq_no_J); if (E.age < W.age) then E := EJ; end RECEIVED := true; BLOCKED := false; for j := 1 to m do if (E.count[j] < t[ j]) then BLOCKED := true; [Al if (not BLOCKED) then for j := 1 to m do E.count[ j] := E.count[ j] - d[ j]; else E.blocked := E.blocked U {(I, t)); E.age := E.age + 1; send RELEASE(seq_no, E) messages to all nodes in Q,; [Bl if (BLOCKED) then begin wait for a PROCEED(seq_no_J, EJ, RETURN-TO) message; seq_no := max(seq_no, seq_no_J); E.age := EJ.age + 1; for j := 1 to m do E.count[jl := EJ.count[jl - d[j]; E.blocked := EJ.blocked - {(I, t>}; if (there is a (J, t) in E.blocked such that E.count[ j] 2 t[ j] for all j) then send a PROCEED(seq_no, E, RETURN-TO) message to node J; else send a PROCEED(seq_no, E, RETURN-TO) message to node RETURN-TO; end end procedure PV(i: array[l..m] of integer); var j: integer; begin seq_no := seq_no + 1; RECEIVED := false; num_received := 0; send REQUEST (seq_no, I) messages to all nodes in Q, while num_received < IQ, I do begin wait for a GRANT (seq_no_J, W) message; num_received := num_received + 1; seq_no := max(seq_no, seq_no_J); if (E.age < W.age) then E := EJ; end RECEIVED := true; E.age := E.age + 1; for j := 1 to m do E.count[ j] := E.count[ j] + i[ j]; [C] if (there is a (J, t> in E.blocked such that E.count[ jl 2 t[ j] for all j) then
and M. Mizuno
Parameterized
Semaphore
Operations
_I. SYSTEMS SOFTWARE 1993: 21:103-112
begin send a PROCEED (seq_no, E, I) message to node J; wait for a PROCEED (seq_no_J, EJ, J) message;
E := EJ; seq_no := max (seq_no, seq_no_J); end send RELEASE (seq_no, E) messages to all nodes in Q,; end procedure REQ; ( * Handle REQUEST (seq_no_J, J) message * ) begin seq_no := max(seq_no, seq_no_J); if (GRANTED.seq_no = 0) then (* GRANTED is empty *) begin send a GRANT (seq_no, E) message to node J; GRANTED.seq_no := seq_no_J; GRANTED.node := J; GRANTED.INQ_sent := false; end else begin put (seq_no_J, J> in QUEUE in timestamp order; if ((GRANTED.INQ_sent = false) and ((seq_no_J < GRANTED.seq_no) or ((seq_no_J = GRANTED.seq_no) and (J < GRANTED.node)))) then begin send an INQUIRE(seq_no) message to GRANTED.node; GRANTED.INQ_sent := true; end end end procedure REL; ( * Handle RELEASE (seq_no_J, EJ) message *) var K: integer; seq _no _ K: integer; begin
E := EJ; [D] seq_no := max (seq_no,seq_no_J); GRANTED.seq_no := 0; if not (QUEUE = ( 1) then begin (seq_no_K, K) := dequeue (QUEUE); send a GRANT(seq_no, E) message to node K;
GRANTED.seq_no := seq_no_K; GRANTED.node := K; GRANTED.INQ_sent := false; end end procedure INQ; ( * Handle INQUIRE (seq_no_J) message * ) begin seq_no := max(seq_no, seq_no_J); if (not RECEIVED) then begin num_received := num_received - 1; send a YIELD(seq_no) message to node J; end end
111
112
J. SYSTEMS SOFTWARE 1993: 21:103-112
M. Neilsen and M. Mizuno
procedure YLD; (* Handle YIELD(seq_no_J) begin put (G~~D.seq_no, G~~D.node) Proceed as if a RELEASE message has been end begin (* Initial values *) E.age = 0; E.count[j] = initial value of jth semaphore, 1 E.bIocked := ( }; GRANTED.seq_no := 0; QUEUE := ( 1; end.
message
*)
in QUEUE
in timestamp
order;
received, from step [D].
I j I m;
REFERENCES 1. L. Presser,
Multiprogamming
Coordination,
ACM
Comp. Sum. 7, 22-44 (1975).
2. D. Agrawal and A. El Abbadi, An Efficient and Fault Tolerant Solution for Mutual Exclusion, ACM Trans. Comp. Syst. 9, l-20 (1991). 3. M. Maekawa, A m ~gorithm for Mutual Exclusion in Decentralized Systems. ACM Trans. Comp. Syst. 3, 145-159 (1985). 4. K. M. Chandy and J. Misra, The Drinking Philosophers Problem, ACM Tram Progr. Lung. Syst. 6, 632-646 (1984). 5. K. Raymond, A Distributed Algorithm for Multiple Entries to a Critical Section, bzfor. Proc. Lett. 30, 189-193 (1989). 6. M. Raynal, A distributed solution to the k-out-of-M resources allocation problem. In Proceedings ofthe 3rd rntemationa~ Conference on Computing and information, Springer-Verlag, LNCS 497, 599-609, 1991.
7. A. Maddi and M. Raynal, Implementing Semaphores on a Distributed Memory Parallel Machine, Proceedings ParalleEComputing Conf. North-Holland, 400-405, 1991. 8. M. Maekawa, A. E. Oldehoeft, and R. R. Oldehoeft, Operating Systems: Advanced Concepts, The Benjamin/ Cummings Publishing Co., 1987. 9. P. J. Courtois, F. Heymans, and D. L. Parnas, Concurrent Control with Readers and Writers, Commun.
ACM 14,667~668 (1971). 10. P. A. Bernstein, V. Hadzilacos, and N. Goodman, Concurrency Control and Recovery in Database Systems,
Addison-Wesley Publishing Co., 1987. 11.. H. Garcia-Molina and D. Barbara, How to Assign Votes in a Distributed System, f. ACM 32, x41-860 (1985). 12. D. K. Gifford, Weighted voting for replicated data, in Proceedings of the 7th ACM Symposium on Operating Systems Principles, 1979, 150-162. 13. M. L. Neilsen and M. Mizuno, Coterie join algorithm. IEEE Transaction on Parallel and ~ist~buted Systems 3, 582-590 (1992). 14. L. Lamport, Time, Clocks, and the Ordering of Events in a Distributed System, Commun. ACM 21, 558-565 (1978). 15. B. A. Sanders, The Information Structure of Distributed Mutual Exclusion Algorithms, ACM Trans. Comp. Syst. 5, 284-299 (1987).
16. D. Barbara, H. Garcia-Molina, and A. Spauster, Increasing Availability under Mutual Exclusion Constraints with Dynamic Vote Reassignment, ACM Trans. Comp. Syst. 7, 394-426 (1989). 17. S. Jajodia and D. Mutchler, Dynamic
Voting Algorithms for Maintaining the Consistency of a Replicated Database, ACM Trans. Database Syst. 15, 230-280 (1990).