JOURNAL
OF PARALLEL
AND
DISTRIBUTED
COMPUTING
9,15-25 ( 1990)
Time Advancement
in Distributed
Event Simulation *
H. F. LI, K. VENKATESH, AND T. RADHAKRISHNAN Department of Computer Science, Concordia University, 1455 dektaisonneuve Blvd., West, Montreal, Quebec, Canada H3G IMS
plished. In general, a process graph may contain a number of strongly connected components. In discrete event simulation each process is characterized by a set of functions it carries out. The basic activity pursued by a process is simulation of events. Events are represented by a tuple (Id. of function, invocation time) or (ej, t,). The tuple specifies the function of the process that should be executed in order to simulate the event and the time at which the event has to be simulated in the process. The assumptions of the model are as stated below:
The problem of updating the simulation clock in distributed discrete event simulation is investigated. Three objectives for efficiently updating the simulation clock in processes modeled by the strongly connected components of a process graph are identified. An optimal solution is proposed for one of the objectives. The optimization problem for another objective is shown to be NP-complete. Consequently, a computationally efficient heuristic solution is proposed. The message complexity and time complexity of the proposed heuristic approach are shown to be asymptotically better than an existing one. Suitability of the algorithm to utilize special features of communication subsystems such as “out-of-band” signaling and prioritized message delivery which could lead to non-FIFO delivery of messages is also elaborated. 8 1990 Academic PWSS,inc.
(A. 1) Each process maintains a local “simulation” clock. At a particular time, it invokes (i.e., simulates) those events which are on its event list and whose invocation time matches with the local time. We assume that the exact time needed to simulate an event is irrelevant, and consider an invocation of an event an atomic action.
1. INTRODUCTION
There is a growing interest in the development of distributed computer systems (DCS) and their applications. Discrete event simulation is one of the important applications of DCS [ 1, 21. A brief review of the problem of performing discrete event simulation in a distributed system is first presented to reveal the basic underlying assumptions and issues that arise. The latter is then formulated as three optimization problems to be formally tackled.
(A.2) In completing the invocation of an event (and removing it from the local event list), new events may be created. Such events may be simulated in the same process or in an adjacent process, depending on the simulation application. In the former case, the event is appended to the local event list. In the latter case, it is sent to the target process in the form of an event message. (A.3) We assume that the system is causal: the invocation time of a created event must be later than that of the event that creates it.
Simulation Environment
(A.4) The channels among processes are point-topoint and directed. The delay for transmitting a message through a channel is unknown but finite.
A distributed computing system consisting of loosely coupled processors forms the backbone of the physical system on which the discrete event simulator will be executed. Loosely coupled processors which interact through message passing constitute the distributed event simulator. Consequently, we can model the simulator by a graph whose nodes represent the processes and edges represent the channels through which message communication can be accom-
2. CLOCK UPDATE
PROBLEM
The simulator functions correctly if and only if each process (P,) never advances its local time (LT,) beyond the invocation time of any event in its event list or that of any event yet to be received on any of its input channels from its neighboring processes.
* The authors thank the anonymous referees for their constructive criticism.
15
0743-73 15/90 $3.00 Copyright 0 1990 by Academic Press, Inc. All rights of reproduction in any form reserved.
16
LI,
VENKATESH,
AND
The problem being addressed in this paper is twofold: (Q. 1) How can the above requirement be met in the distributed environment?
on clock update
(4.2) In what ways can the effectiveness and efficiency of such a solution be evaluated, and if possible, are there optimal solutions? In the literature, distributed (Time Incrementation [ 1,2] and Time Acceleration [ 31) algorithms have been proposed in order to update the local simulation clocks. The Time Incrementation algorithm is very inefficient when applied to strongly connected components of a process graph and hence Bryant proposed the Time Acceleration algorithm [ 3 1. The focus of this paper is on the advancement of clocks in strongly connected components of process graphs and we will be concerned only with logically synchronous clock update. In the logically synchronous approach for clock update proposed by Bryant, a global “state” detection phase is executed first [ 41. The global state in this problem corresponds to determining the earliest invocation time of any event in any event list, or in transit on some channel. To ensure correctness, a process can participate in this global state detection only after it has completed the invocation of all events in its event list whose invocation time coincides with the local time. After the global state detection, a consensus can be reached so that all processes will be notified to update their local time to a common value, thus the notion of logical synchronism. Bryant has proposed an intuitive and ad hoc solution to this problem. This paper addresses the formal requirement of the problem which is then formulated as optimization problems that can be systematically solved, using a variety of algorithm techniques to suit different optimization objectives. We will analyze the complexities of these optimization problems and prove that one of them has a simple optimal solution while the other is NP-complete for which an efficient heuristic is proposed. We will also demonstrate the merits of our solutions over Bryant’s. 2.1. Formulation
of the Clock Update Problem
A distinction between Bryant’s approach and ours exists which makes the optimization problem different. Bryant assumes that to obtain the global state, every edge of the process graph must be traversed (with a test message) in order to account for the events-in-transit on that edge. We simplify this requirement by observing that each process is fully aware of the set of events it has created and sent on an output channel to a neighboring process since the last iteration of the clock update. So if a process P, maintains a MINTIME, for its output channel to process Pj which reflects the earliest next event invocation time of the events sent on that channel since the last update, a global decision can be
RADHAKRISHNAN
pieced together by simply visiting all processes, rather than flushing all edges. Additional
Assumptions
(A.5) Each process maintains a variable MINTIME, for each of its output channels. This variable is set to the earliest invocation time of the events sent on that channel after the previous clock update iteration. (A.6) A process is ready to participate in the ith iteration of the global clock update only if: (i) it has completed the events on its event list whose invocation time coincides with the current time, and (ii) all the events created by neighbor processes and sent to this process at the (i - 1 )th iteration have been received. The latter condition is imposed SO that MINTIME, is confined to the most recent set of events created and sent on each channel. Problem SpeciJication Under the assumptions stated in (A.1 )-(A.6), we can state the simulation clock update problem formally. Let Ej = set of events contained in (or known to) process Pi after the previous clock update. Mu = set of events in transit on channel (i, j) leading from Pi t0 Pj. Clock Advancement Problem Invariant:
V(ej,tj)EEiUMki k
LTi
(ej,
tj)
E E,
U
Mk,
will eventually lead to
k
LT, >
t,.
The invariant states that the simulation clock in a local process Pi is never advanced beyond the time of any event yet to be simulated by Pi, whether the latter is already known to Pi or in transit from some Pk to P,. The progress requirement asserts that any known event will be simulated within finite time. 2.2. Conceptual
Solution
Imagine one of the processes, say PI, is chosen as the coordinator for time advancement. A set of inspectors, which we call red tokens and green tokens, are sent out from P, . The red tokens represent test messages and the green tokens represent marker messages. The traversal of a channel edge
DISTRIBUTED
EVENT
by a red token represents the transmission of a test message to invoke the receiver process to participate in the decision of the maximum clock advancement. The traversal of a channel edge by a green token serves to flush the channel and separate events generated in two successive iterations of clock advancement. Upon receipt of a red token, a processwill send out red tokens on some of its output channels, and may send green tokens on some of the remaining ones. The exact strategy for sending tokens will be algorithm dependent, and will lead to different performance characteristics. Regardless of the strategy details, a feasible solution of the global state detection problem is obtained when S(i) S( ii) S( iii)
each node (process) has been visited by a red token, all red tokens have returned to P, , and every edge has been traversed by either a red token or a green token.
Notice that green tokens do not have to return to P, . The condition S(i) is required to pick up the local information in each process, and S( ii) ensures that all local information has returned to the coordinator. The condition S( iii) ensures that the channels are flushed before the next iteration of clock update proceeds [the assumption (A.6)]. The above solution contrasts with Bryant’s solution which requires that every edge be traversed by a red token, and all red tokens return to the coordinator. In adapting the global state paradigm for simulation clock advancement, the following explanation can be offered. The coordinator starts the ith clock advancement iteration after completing its events that belong to the (i - 1 )th iteration (marked by the receipt of either red or green tokens on its input channels). It determines the earliest invocation time among the events in its event list, and communicates it in the form of test messages. A process will hold the test message for the ith iteration until it is ready to perform the assessment for clock update. During assessment the process determines the earliest invocation time t,,, from the set of events currently in its event list, all those events which it has sent to its neighbors in the simulation iteration just concluded, and the clock value in the received test message. t,,, replaces the possible global clock value in the received test message which is then forwarded to the neighboring process( es). After all the test messages of the ith iteration have returned to the coordinator, the coordinator determines the earliest invocation time tmin among the clock values contained in the test messages. This value tmin is then propagated to all processes for updating their local clocks (by sending Set messages). It can be argued that the above intuitive clock advancement paradigm will implement the problem specification given earlier, taking note that message delivery is guaranteed within finite time. The conceptual solution involves
17
SIMULATION
sending a test message to every process which will “wake” up and participate in the global decision making only if it has complete knowledge of the events it is responsible for keeping track. This local knowledge will be eventually accumulated and returned to the coordinator which makes the final decision and informs everyone of the new clock value. The efficiency of the solution, however, depends on the procedure of message transmission and coordination, which are subjects of this paper. 2.3. Optimization
Problems
The above formulation yields a feasible solution, but we could select better solutions by asserting certain objective functions that optimize certain performance aspects. The following are three candidate objective functions: Ob( 1). Minimize the total cost of the solution where cost = number of edge traversals (with repetitions if necessary) by red and green tokens singly or in groups. Ob( 2). Minimize the total cost of the solution where cost = number of edge traversals (with repetitions) by red tokens singly or in groups. Ob( 3). Minimize the longest walk by any red token from the coordinator to itself. Following the interpretation of the red and green tokens, the objectives Ob( 1)) Ob( 2)) and Ob( 3) can be visualized as follows. Ob( 1) optimizes the total number of test messages and marker messages, and thus the total message overhead. Ob( 2) optimizes the total number of test messages only, assuming marker messages are less expensive. Ob( 3 ) optimizes the longest path delay, i.e., the longest sequence of processes that must be traversed in sequence by a test message to complete an update, and thus would lead to a solution that allows the update to be performed fast. To illustrate these different objectives and solutions, consider the example process graph depicted in Fig. la where process 1 is assumed to be the coordinator. The optimal solution that minimizes the test message overhead is shown in Fig. lb. Similarly, those for Ob( 1) and Ob( 3) are shown in Figs. 1c and 1d, respectively. 2.4. An Optimal Algorithm
for Ob( 3)
Algorithm Fast-Update begin { Obtain a breadth-first-search tree rooted away from the coordinator; Send test messages to all nodes according to this tree; } {Obtain a breadth-first-search tree rooted toward the coordinator; Return test message to the coordinator according to this tree; } end.
LI, VENKATESH,
18
AND RADHAKRISHNAN
a
1 2
f
4
FIG. 1. (a) Process graph. (b) Test message only (test message path). (c) Test + marker messages.(Left) Test message path. (Right) Marker message paths. (d) Minimize longest walk (test message paths).
For the example process graph in Fig. la, the two breadth-first-search trees of Fast-Update are shown in Fig. 2, which when merged will yield the solution shown in Fig. Id.
Explanation:
THEOREM 1. Algorithm Fast-Updateyields an optimal solutionfor Ob(3). Proof The first breadth-first-search tree yields the
shortest path from the coordinator to each process while the second yields the shortest path in the reverse. The two shortest paths together allow the coordinator to pick up the information from the processes in the fastest way possible. Thus the combined solution minimizes the path through each process and so Ob( 3). Q.E.D. Notice that the combining of the two trees may lead to merging of test messages and further economy of the total number of test messages. For the example of Fig. 1, only six test messages (a, f, e, c, d, g) are needed, rather than eight. However, this latter aspect is unrelated to Ob( 3) and will be left to later considerations of Ob( 1) and Ob( 2). 3. COMPLEXITY
and returns to the coordinator such that the total number of repeated traversal of the edges by test messages is k . We observe that to minimize the total number of test and marker messages is the same as to minimize the total number of repeated traversal of edges by test messages, as those edges not traversed by a test message have to be flushed by a marker message to prepare them for the next iteration, as explained previously.
We prove TC is NP-complete by reducing the set cover problem [ 51 to TC. The set cover problem is stated as follows: SC: Input: Output:
Set Cover A set C of subsets { C, , . . . , C, } of a finite set S=(X,,.. . , & } and a positive integer k G n . Yes, iff C contains a subset C’ so that Uc,tc, C, =Sand ]C’] = k.
OF Ob( I)
The complexity of the optimization problem which minimizes the total number of edge traversals by test and marker messages (Ob( 1)) can be revealed by proving the NP-completeness of the following formulation: TC: Input: Output:
Test Message Cover A process graph of y1nodes one of which is the coordinator, and a positive integer k. Yes, iff there exists a routing of test messages from the coordinator which covers all nodes
a b5 FIG. 2. Fast-Update search trees. (a) Forward breadth-first-search tree. (b) Reverse breadth-first-search tree.
DISTRIBUTED
EVENT
To derive the proof, the following polynomial-time transformation procedure is developed to transform a given instance of SC into an instance of TC. Problem Transformation 1. For each Ci E C construct a rooted subtree of the form shown in Fig. 3. The root is labeled Ci, and has a single successor Di. In turn Dj has a set of successors, one for each of the elements in C,, which form the leaves of the subtree and are labeled accordingly. In addition, an extra successor labeled T, is added for Di. 2. Add a root R whose successors are precisely C = ( C, , . . . . cl>. 3. Form a process graph G by adding the following edges: (a) For each leaf node labeled with Xj, if X, E Cl, edge (X,, C,) is created. (b) For each T, edge ( Ti, R) is created. For the example, the resulting process graph is shown in Fig. 3b. THEOREM 2. A given instance of the set cover (SC) problem of size k has an afirmative output #the corresponding instance of TC obtained above possesses an ajirmative solution with k repeated edge traversals.
Proof: (+ ). Given a solution of TC with k repeated edge traversals by test messages:
19
SIMULATION
Let the number of (Ci, D,) edges repeatedly traversed by some test messages be k’ < k. Without loss of generality, assume these repeated edges are precisely ( C, , D, ), (Cl, D2). . .(&, Dk,). We claim C’ = {C,, . . . , Ckt} form a set cover of S. To prove this claim, we use contradiction. Suppose the claim is not true so that there exists t E S and none of the leaves in C’ is labeled with t. Further, without loss of generality, let Y = { C,,,, , . . . , C’~J+~}and VC; E Y let t E C,. Now the leaf nodes labeled with t in Y can only pass the return test messages to some Cj in Y. There are r such leaf nodes and r such C;‘s. Either (a) by the pigeon hole principle, some C, E Y is visited twice and consequently (CL, Di) visited twice, or (b) the traversal of the t leaves form a cycle without repeating C,‘s. In case (a), it contradicts the assumption (then Ci E C’). In case (b), the t leaves cannot return a test message to R which routes through the cycle once and stops. Thus C’ must form a set cover of size k’ < k; a set cover of size k thus exists. (C ). The reverse is straightforward. Given a set cover, sayC’={C,,..., Ci} . R sends test messages to all C, E C, which route them to all leaves except the T,‘s. These leaves then return test messages to nodes in C’ which then group them in one message and forward them through the corresponding T;‘s back to R . The number of repeated edge traversals that could occur only at (Ci, D,)‘s by test messages is precisely k. Q.E.D. COROLLARY
3.
a Let Cl = {x,.x,1;
FIG.
3.
c2 = (x2,x31;
(a) Transformation
cg = ixl,x,l;
Step 1. (b) End of transformation.
TC is NP-complete.
LI, VENKATESH,
20
AND RADHAKRISHNAN
b
AP
b
a
B
3
B
2 t71 0
d
FIG. 4. (a) Process graph. (b) End of Phase I. Edges are labeled with leaf nodes that are reachable from the root via that edge. Legend: A = { 7,9, l0,ll};B={9,10,ll}.Leafnodes:T={ll};N={7,9,lO}.(c)Step2BFT,.(d)Step3labelingpathsfromnodes~T.(e)Step4labelingofreverse paths.(f)EndofPhaseII.Legend:A={7,9,l0,ll};B={7’,9,l0,ll};C={9,10,Il},{7’,9’}:D={9,l0,1l},{7’,9’,l0’}.
3.1. Heuristic Solution Having identified the NP-completeness of the optimization problem, an efficient heuristic solution is justifiable for achieving the objective of minimizing the number of messages used in each update iteration. The heuristic proposed here consists of two phases. In Phase I, the algorithm creates a forward breadth-first spanning tree rooted from the coordinator. This tree is used to broadcast test messages which will reach all nodes in the graph G. In Phase II, reverse paths from the leaf nodes are derived greedily from a reverse breadth-first spanning tree rooted toward the coordinator. Greediness is maintained so that the return path, of a leaf node identified in phase I, invokes the minimal marginal cost measured in terms of the additional edges which have to be repeated. We also will prove that an edge will be traversed at most twice by test messages in one iteration of clock update.
DEFINITION. For each node u in a tree, let d(v) = distance of u from the root of the tree, measured by the number of nodes between u and the root. Algorithm
MINMES
Phase I: Step 1: Obtain a forward breadth-first spanning tree BFTr of G rooted from the coordinator R. Partition the leaf nodes as: T: Subset of leaf nodes which are directly adjacent to R, and N: the remaining leaf nodes. Label each edge in G which is also found in BFTf by the set of leaf nodes reachable from the root R via that edge in BFTf. For the process graph
DISTRIBUTED
EVENT
21
SIMULATION
e step 4: N = {7,9,10) Choose 7 E N:
@!?7') 0 (Complete
Choose
Label
in G for
edge
(2,3):
1 19,10,11}
9 E N:
U {7')
= {7',9,10,11})
Choose
10 E N:
0
!
\ .
FIG.
b-Continued
of Fig. 4a this labeling scheme is exemplified in Fig. 4b. We assume that each edge in G has two buckets to hold labels, and in this step all labels assigned to an edge are placed in the first bucket.
Phase II: Step 2: Obtain a reverse breadth-first spanning tree BFT, of G rooted to the coordinator R as exemplified in Fig. 4c.
22
LI, VENKATESH,
AND RADHAKRISHNAN
Step 3: For each t E T (leaves in BFTr directly adjacent to R), remove (t , R) from BIT, and label (t , R) in G by t’ (placing t’ in the first bucket of (t, R)), as shown in Fig. 4d. A forest may result for BFT,. Step 4: Choose and remove an x E N( leaves which must return test messags to R). Find among its direct successors Succ( x) the successor y whose d(v) is minimum. Label the edges on the path from x through y to the coordinator R in the original BFT, by x’. The corresponding edges in G are also labeled by x’: If either a second bucket exists, or x is found in the first bucket of the edge in G, then x’ and all y’ found in the first bucket are moved from the first bucket and placed in the second bucket; else x’ is placed in the first bucket. Remove these edges from the current BFT, and repeat step 4 until N = @. The iterations are exemplified in Fig. 4e and the final labels of the chosen edges on which test messages are routed are shown in Fig. 4f. Table I tabulates the labels placed in the buckets of each of the edges in G. THEOREM 4. Two buckets are suJicient to hold all the labels that may be assigned to an edge in MINMES so that a labelpair {x, x’} is never placed in the same bucket.
Proo$ This is obvious from the construction as x’ is placed in the second bucket iff x is placed in the first or a second bucket already exists. Since the second bucket is created only during the consideration of the reverse paths x cannot exist in the second bucket and so x’ can definitely be put into the second bucket. Q.E.D.
Illustration of the Labeling Scheme
Edges
I,2 237 732 2,3 3,4 4,5 536
68 879 8,lO 9,4 10,4 6,11 11,l
Bucket
1
{7,9,10,11) i71 17’1 17’,9,10, ,I 1) {7’,9,10, ,ll} {9,10,11 {9,10,11 i (9,101 (91 {‘Ol (9’1 1’0’1 {7’,9’,10’,11} {7’,9’,10’,11’}
[k, B, tltest= test message TM for the kth iteration, tagged with an edge label B and the local clock advancement t so far accumulated. = marker message for the kth iteration. [kl marker tested ( k , i , j) = a Boolean flag set if and only if the channel edge (i, j) labeled with B, (and possibly also B,) has received test message [k, BI,
tltest(and[k B2,I’M. Process Pi will execute the following time advancement procedure after it has simulated all events of the previous iteration and received marker messages of the previous iteration on its unlabeled input channels.
TABLE I
Labels
The implication of the above theorem is that an edge may have to transmit two test messages in the worst case. Algorithm MINMES is obviously an attractive heuristic as it minimizes the longest closed walk (therefore delay) from the coordinator R to the leaf nodes and back to R, and at the same time it attempts to minimize repeated test message transmission. The complexity of heuristic MINMES is O( n 2), where n = number of processes, which corresponds to the iterative labeling applied in step 4. We observe that MINMES assumes a predetermined coordinator and the algorithm is executed once to initialize the data structures used in the actual clock advancement during simulation. The performance of the solution may differ if the coordinator is assigned to another process. An effective coordinator can be chosen if the savings in actual simulation justify the cost of this enumeration. We do not address the problem of coordinator selection in this paper. To complete our solution, the clock advancement procedure integrated in the process code will be given next. It is understood that the procedure implements the solution from MINMES during simulation. The following variables are defined:
Time Advancement Procedure (kth Iteration)
on edges Bucket
2
(7’,9’) { 7’,9’, 10’)
-
-
Repeat
process i sends [k, B,,tltest on channel (i,j> if it has received test messages [k, B1 ,tlltest, . . . , [k, B,, ,tnltest and tested( k,i, j) is false where ,..., t,,t,} B, U ... UB,=BT,t=min{t, and t, = minimum time of the events known to P, until II tested(k,i, j); Process i sends [ klmarker on each of its unlabeled output edge. For the example MINMES solution shown in Fig. 4, we can illustrate the procedure by considering the actions in process 1,2, and 7.
DISTRIBUTED
Process 1 (Coordinator): 1 1} , tJtest to process P2.
EVENT
P, sends TM, = [k, { 7, 9, 10,
Process 2: Upon receipt of TM!, P2 checks that label of the edge (2, 7) = { 7) E { 7, 9, 10, ll}. Pz sends TM2 = [k {7),tJtesttoP,.
Process 7: Upon receipt of TM2, P, converts TM2 to TM3 = [k { 7’1, a3t and returns it to P2, where t’ = min { t, , t,, > and b, = minimum event time known to P,.
23
SIMULATION
involves only test messages. The heuristic possessesan identical characteristic when applied to optimize Ob( 2). 4. PERFORMANCE
COMPARISON
HEURISTIC
WITH
OF BRYANT’S
MINMES
Upper Bound of Message Overhead
Consider a process graph with O(n) nodes using Bryant’s heuristic for clock update. Each process has O(n) input Process 2: Upon receipt of TM3, P2 checks that { 7’) edges and forwards a test message upon receiving one from an input edge. Each test message sent out may percolate U{7,9,10,11}~(7’,9,10,1l}.P,sendsTM,=[k,{7’, through O(n) edges to reach the coordinator. So the total to P3, where t” = min { t’, te2 } and te2 9, 10, 111, utest = minimum event time known to P2. number of probe messages is O(n*n*n) = O(n3). The flushing of edges takes an additional O(e) messages, where It should be observed that edge (9, 3) is not labeled by e = number of edges, so that an upper bound of the message MINMES. A marker message [ klmarkerwill be sent from Ps overhead in Bryant’s case is O(e + n3). This upper bound to P3 after Ps has sent a test message to P4. This marker is tight and is required by the process graph shown in Fig. message marks the end of this clock advancement so that 5. In that case, the graph has a completely connected subany event sent on edge (9, 3) after the marker is an event graph K,,. In Fig. 5 the process A receives 0( n2) probe messages originating from the various nodes. Returning these for the (k + 1)th iteration. messages to process 1, the coordinator, requires n edge traAn alternative strategy for distributed event simulation based on the use of virtual time [ 61 can be developed. That versals, leading to the complexity 0( n3). Our algorithm MINMES in coordinating the forwarding strategy differs from ours in its need of rollback in simulaof test messages will require (2n - 2 - k) edge traversals by tion in order to maintain consistency in simulation [ 71. We choose to follow a non-rollback strategy to reduce the re- test messages, where k is the number of edges which are shared by a forward and a return path. Since each shared dundant computation and memory overhead to support path will carry at most two test messages even in the worst rollback. These two distinct approaches will fit differing case, the test message count is O(n). If the marker messages simulation characteristics and needs. Also, a distributed are included, the upper bound of message overhead beevent simulation system can be designed and a hierarchical comesO(e+k)
24
Ll, VENKATESH,
AND RADHAKRISHNAN
Speedup of MINMES If we try to measure the speed of an update iteration by the longest path delay of message propagation from the coordinator to any node and back to itself where path delay is the number of edge traversals by a test message, we can immediately deduce that Bryant’s algorithm has a worst case delay of 0( n2) corresponding to the case of a test message routing through the n2 edges in the K, subgraph of Fig. 5 and then through A to B and back to the coordinator. MINMES on the other hand has a worst case delay of 0( n) as evidenced by the traversal of BFTr and BFT, so that the height of either tree is at most O(n). 5. EXTENSION
TO SUPPORT NON-FIFO
CHANNELS
In communication systems having multiple paths between sources and destinations the buffer wait delay is appreciably reduced. However, non-FIFO delivery could occur. Resequencing the out-of-order messages in order to maintain the FIFO property could increase the delay by a factor of three [9, lo]. In general non-FIFO channels are found to allow simpler and more efficient implementations of the underlying communication subsystem [ 111. New communication standards and architectures like IEEE 802 LAN standards, Intel’s iPSC, and ISDN architectures support prioritized delivery of messages and “out-of-band” signaling channels which have higher priorities than regular channels. They could be fruitfully employed to send test messages. However, since the test messages might then overtake some of the event messages sent earlier on the channel, non-FIFO delivery could occur. Moreover, discrete event simulation does not require channels to be FIFO intrinsically, as long as the logical synchronism is maintained at the coordinator for each iteration. As a result, we believe that control algorithms like the clock update algorithm should not impose FIFO constraints when it is not required by the application itself. They should also utilize the facilities offered by the communication subsystem such as prioritized message delivery to its advantage. Relaxation of the FIFO channel constraint implies that the control and event messages sent on the same channel could overtake one another on the channel. Hence the receipt of a control message (test or marker) will not indicate the flushing of the channel with respect to the previous iteration. So in order to ensure a process knows that it has completed the kth simulation iteration and can respond to the (k + 1 )th test message for clock update, a test or marker message which may have reached a process ahead of earlier event messages should carry with it the corresponding event list sent on that channel generated in the (k - 1 )th iteration. This redundancy is required to allow for non-FIFO channel
behavior. For more details on this, the reader is referred to [12]. 6. CONCLUSIONS We have presented efficient schemes under different objectives for updating the simulation clocks in the processes modeled by strongly connected components of a process graph which implements discrete event simulation. In our schemes, in order to perform a clock update, the global state of the distributed processes is compiled by a chosen coordinator and the earliest invocation time of any event which is either in the event list of any process or in transit on some channel is determined. Unlike existing solutions, we keep track of the events that each process has created and transmitted to its neighboring processes and gather the local information simply by visiting all processes rather than flushing all channel edges. In the explanation of our scheme, we have assumed that the simulation is frozen while the clock update is in progress. However, instead of freezing the simulation completely while the clock update is in progress we can execute a logically asynchronous clock update algorithm (e.g., Time Incrementation [ 1, 2 ] ) in the background to improve concurrency. The coordinator can choose the opportune time for starting the next clock update iteration as follows. In the kth update iteration, in addition to determining the earliest invocation time ( tmin) we can also estimate the second earliest invocation time ( ts). If t, $ tmin then the test messages corresponding to the (k + 1) th iteration can be transmitted as soon as the coordinator becomes ready to perform the (k + 1 )th iteration. Otherwise, the coordinator should delay the commencement of the (k + 1 )th iteration so that the distributed update algorithm can advance the clock more efficiently beyond t,. Notice that t, is only an estimate and the coordinator should not directly set the clocks to t, because an event with invocation time t < t, may have been generated during the ith simulation iteration. All the performance results that have been derived in this paper remain applicable even in this context [ 121. Actual performance results based on experimental evaluation have not been quoted in this paper. Such a study is in progress. We believe our algorithms will compare favorably with the existing ones in practice. In continuing the effort of developing such distributed algorithms, it will be interesting to examine two variations of the problem: ( 1) the removal of a coordinator for time advancement, and (2) the support of concurrent clock advancement and event simulation without serialization of iterations. REFERENCES 1. Chandy, K. M., and Misra, J. A non-trivial example of concurrent processing: Distributed simulation. COMPSAC 78, IEEE, New York, pp. 822-826.
DISTRIBUTED
EVENT
2. Chandy, K. M., and Misra, J. Asynchronous distributed simulation via a sequence of parallel computation. Comm. ACM (Apr. 198 1 ), 198-206. 3. Bryant, R. E. Simulation on a distributed system. Proc. First Cor@rence on Distributed Computing Systems, 1979, pp. 544-552. 4. Chandy, K. M., and Lamport, L. Distributed snapshots: Determination of global states of distributed systems. ACM Truns. on Comput. Systems3, I (Feb. 1985), 63-75. 5. Garey, M. R., and Johnson, D. S. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco. 1979. 6. Jefferson. D. R. Virtual time. ACM Trans. Programming Languages S~~tems(July 1985), 401-425. I. Jefferson, D. and Sowizral, H. Fast concurrent simulation using the time warp mechanism. Distributed Simulation Conference Proceedings 1985. Jan. 24-26, 1985, pp. 63-69. 8. Prakash. A., and Ramamoorthy, C. V. Hierarchical distributed simulation. Proc. 8th International Conference on Distributed Computing System. 1988, pp. 34 l-348. 9. Chin, C. Y., and Hwang, K. Packet switching networks for multiprocessors and data flow computers. IEEE Trans. Comput. C-33, I I (1984). 991-1003. 10. Kumar, K. B., and Kermani, P. Analysis of a resequencing problem in communication networks. RC9767 IBM T. J. Watson Research Center, Yorktown Heights, NY, Aug. 1982. I I. Slatzer. J. H., Reed, D. P., and Clark, D. D. End-to-end arguments in system design. ACM Trans. Comput. Systems (Nov. 1984), pp. 277288. 12. Venkatesh, K. Global states of distributed systems: Classification and application. Ph.D. thesis, Department of Computer Science, Concordia University, Montreal, Canada, Feb. 1988. Received July 28. 1986; revised September 30. 1988
SIMULATION
25
HON F. LI received the B.S. degree with highest honors from the University of California, Berkeley, in 1972, and remained there until 1975 when he completed the Ph.D. degree. Subsequently, he joined the University of Illinois, Urbana-Champaign, as an assistant professor in the Computer Engineering Group, Department of Electrical Engineering. He returned to Hong Kong in 1977 and served as Lecturer and later, Senior Lecturer, in the Department of Electrical Engineering, University of Hong Kong. Currently, he is a professor in the Department of Computer Science, Concordia University, Montreal, Quebec, Canada. His research interests cover the general areas of parallel and distributed processing, algorithms and architectures. and delay-insensitive systems.
KRISHNA RAO VENKATESH obtained his B.E. from Bangalore University and M.Comp.Sc. and Ph.D. degrees from Concordia University in 1977 and 1988. respectively. Since 1977 he has been working at Bharat Electronics Ltd. in their Computer Research and Development Section. During I984- 1988 he had been a Commonwealth Scholar researching at Concordia University. His interests are in distributed processing, fault tolerant system design and development, and parallel processing systems based on multiple micros.
THIRUVENGADAM RADHAKRISHNAN obtained his B.E. (Hons) from the University of Madras in 1966 and M.Tech. and Ph.D. degrees from the Indian Institute of Technology, Kanpur India, in 1968 and 197 1, respectively. Currently he is working as an associate professor of computer science at Concordia University, Montreal, Canada. His research interests are in distributed processisng, man-machine communication, cooperating Expert Systems, and microprocessor based systems for the physically handicapped.