SIMULATION
PRACTICE
ELSEVIER
= THEORY
Simulation Practice and Theory 2 (1995) 257-275
Mapping hierarchical, modular discrete event models in a hypercube multicomputer * Yeong Rak Seong, Tag Gon Kim *, Kyu Ho Park Department of Electrical Engineering, Korea Advanced Institutes of Science and Technology, 373-l Kusong-dong Yusong-gu, Taejon 305-701, South Korea
Received 18 July 1994; revised 1 December 1994
Abstract The Discrete Event Systems Specification (DEVS) formalism specifies discrete event systems in a modular, hierarchical form. Timed state transitions within the formalism are specified by an external transition function for randomly arriving external events. The internal transition function schedules internal events. This paper devises a mapping algorithm which exploits a maximum degree of parallelism in DEVS simulation. For parallel DEVS simulation, some hierarchical simulation algorithms were already developed. However, our mapping algorithm employs different approaches for parallelizing the two types of events. For external events parallelization, a task graph representing a hierarchical DEVS model is transformed into a binomial tree which can be easily embedded into a hypercube network. The transformation markedly reduces the synchronization overhead for parallel computation. For internal events parallelization, a hypercube is partitioned into a set of subcubes such that an internal event, and the external events caused by the internal event are processed in the same subcube. Simulation experimentations with a simple benchmark model show that the potential parallelism of DEVS simulation can be exploited by using the proposed algorithm. Keywords:
DEVS formalism; Parallel simulation; Discrete event systems; Object orientation
1. Introduction Recently, discrete event systems modeling and simulation is widely used to analyze so-called man-made systems such as computers, communication networks, flexible
manufacturing systems, etc. However, simulation of a large, complex system suffers from huge time and space complexities. Parallel processing is a promising approach
* The preliminary version of this paper has been presented in International Conference Massively Parallel Processing Applications and Development 1994. * Corresponding author. 0928-4869/95/$09.50 0 1995 Elsevier Science B.V. All rights reserved SSDZ 0928-4869(95)00004-6
258
Y. R. Seong et al.lSimulation
Practice and Theory 2 (1995) 257-275
for such applications. Many Parallel Discrete Event Simulation (PDES) algorithms have been proposed. The algorithms can be classified into two groups, conservative algorithms [ 10,141 and optimistic algorithms [ $91. Conservative algorithms process events in a global simulation time order. Optimistic algorithms, however, execute events in a local simulation time order. Since local simulation time is not always ordered strictly, optimistic algorithms can result in erroneous computation. In such cases, the local simulation time should be reordered by rollback operations which undo the erroneous computation. Some formal methods for developing simulation models of discrete event systems have been developed. Zeigler’s Discrete Event Systems Specification (DEVS) formalism is one such formalism [17,18]. Based on set theory, an atomic DEVS basically is a timed state transition model. In such a model, state transitions of a component model are induced by external and internal events. An external event arrives at the model randomly; an internal event occurs at the scheduled time. However, the DEVS formalism differs from others in that it specifies discrete event systems in a modular, hierarchical manner. Moreover, the formalism provides sound semantics for developing hierarchical models in an object oriented framework. Realization of the DEVS formalism in object oriented programming environments has been one of important research topics in the field [ 1,7]. Recently, in [ 71, an object oriented simulation environment, called DEVSim + +, is implemented on Sun Spare-II. DEVSim + + realizes the DEVS formalism and the associated hierarchical scheduling algorithms in C + + . As an extension of DEVSim + + , we are developing P-DEVSim+ + which parallelizes the DEVSim+ + environment to run on a hypercube multicomputer system [ 131. In contrast to general PDES studies, P-DEVSim+ + is based on the DEVS formalism. Hence, parallel DEVS simulation somewhat differs from general PDES cases. For such simulation, the PAR_INT algorithm was developed. The algorithm controls the simulation time in a conservative manner. But, it executes external events in an optimistic manner. The combined approach to parallel simulation results in localization of rollbacks hence speedup. Currently, P-DEVSim+ + was already implemented on a hypercube simulator environment and is being realized on KAICUBE-860, a five-dimensional hypercube multicomputer developed at KAIST. The purpose of this paper is to devise a mapping algorithm which exploits a maximum degree of parallelism in DEVS simulation [ 121. Thus, the problem is to map modular, hierarchical DEVS models on a hypercube topology such that internal events as well as external events are executed in parallel. At this time, we assume that a node processor in the multicomputer used in simulation has no multicasting capability. Our mapping algorithm employs different approaches for parallelizing the two types of events. More specifically, transformation of task graphs is employed for external events parallelization; clustering of hypercube nodes is employed for internal events parallelization. For external events parallelization, a task graph, called a composition tree, representing a hierarchical DEVS model, is transformed into a binomial tree which can be easily embedded into a hypercube network. The transformation markedly reduces the synchronization overhead for parallel computation. For internal event parallelization, a hypercube is partitioned into a set of subcubes
Y. R. Seong et al./Simulation Practice and Theory 2 ( 1995) 2.57-275
259
such that an internal event, and the external events caused by the internal event are processed in the same subcube. To analyze the proposed mapping algorithm instrumented in P-DEVSim+ +, a performance simulator which simulates the runtime behavior of P-DEVSim + + was developed [ 111. For performance evaluation, a simple parallel computing center model is used. The proposed algorithm generates various transformations for a given task graph with interprocessor communication times and external event processing times parameterized. Experimental results show that the potential parallelism of PAR_INT can be exploited by using the proposed mapping algorithm. This paper is organized as follows. Section 2 briefly reviews the DEVS formalism and the associated abstract simulator algorithm. Section 3 describes the P-DEVSim+ +, the environment for this work. Section 4 discusses the mapping algorithm. Section 5 shows some experimental results for verifying the proposed algorithm. Section 6 draws conclusions.
2. The DEVS formalism The DEVS formalism specifies discrete event systems in a hierarchical, modular form. For modeling a system by using the formalism, the system should be decomposed into basic models in a hierarchical manner. A basic model M, also known as an atomic model, is defined as [ 17,181 M=(X,
S, Y Ji,,, J,,,, 1, La>,
where x: S: Y: Sint: 6exf: 2: ta:
input events set, sequential states set, output events set, S+S: internal transition function, Q x X+S: external transition function, S+ Y: output function, S-R&,: time advance function,
where Q is the total state of M given by Q= {(s,e) 1 s E S and 0 < e d ta(s)}. A coupled model connects several component models together to construct a new model. Since a coupled model can be employed as a component of a larger model, a complex model can be constructed in hierarchical fashion. A coupled model DN is defined as [17,18] DN = CD, {Mi >) {Ii 1, {Zi,j >2 select) where D:
set of component names,
260
Y. R Seong et al./Simulation
Practice and Theory 2 (1995) 257-275
for each i in D Mi: Ii:
DEVS for component i in D, set of influencees of i
for each j in li Zi,j:
x +Xj:
i-to-j
output translation function, tie-breaking function.
select: subsets of D+D:
The abstract simulator associated with DEVS models is a logical device interpreting the dynamics of DEVS models by use of hierarchical scheduling algorithms [ 181. For hierarchical simulation, each model is associated with a corresponding abstract simulator in an one-to-one manner, hence the structure of abstract simulators is the same with that of models. For each type of models, two abstract simulators are employed: a simulator for an atomic model and a coordinator for a coupled model. Figs. 1 and 2 show the algorithms for a simulator and a coordinator, respectively. The DEVS abstract simulators carry out their task by using four types of messages sent to and from their parent abstract simulator: (*, t) to indicate that the scheduled next (internal) event time comes, (x, t) to reflect that an external event x is arrived at time t, (y, t) to illustrate that an output y is produced at time t, and (done, tN) to specify that processing of the last message to arrive is finished and the next event time is scheduled to t,. When a simulator receives a (x,t) message, it executes the external transition
Algorithm sim:when_rcv_(x, t) 1 ift L-.it
coordinator;
Algorithm sim:when_rcv_(*, t) 1 if t=tN then 2 y:=l(s); 3 send (y, t) to the parent coordinator; 4 s: = 6,&); 5 t,:=t; 6 t,:=t,+ta(s); 7 send (done, tN) to the parent coordinator; 8 else 9 error; 10 end if
Fig. 1. Abstract
simulator
for atomic
DEW
models.
Y.R. Seong et al.lSimulation Practice and Theory 2 (1995) 257-275
Algorithm coor:when_rcv_(x, t) 1 if t I..
261
i;
Algorithm coor:when_rcv_(*, t) 1 if t=t, then 2 find the imminent simulators (those with minimum t,); 3 select one, i*, and send the input (*, t) to it; 4 send the signals (x~,,~, t) to each of its influences j; 5 wait until this simulator i* and each of its influences are done; 6 tL:=t; 7 tN: = minimum of component t,s; 8 send (done, tN) to the parent coordinator; 9 else 10 error; 11 endif
Fig. 2. Abstract
simulator
for coupled
DEVS models.
function of the associated model. For the execution, the simulator calculates the time elapsed since the last state transition. And, when the simulator receives a (*,t) message, it executes the output and internal transition functions of the associated model. In abstract simulator concepts, only internal events can produce output messages. The generated output messages are sent to the parent coordinator. After executing the external or internal transition function, the next event time tN should be recalculated using the time advance function and notified to the parent coordinator with a (done, tN) message. The algorithm of a coordinator is slightly more complex than that of a simulator. When a coordinator receives a (x,t) message, it finds the subordinates affected by the message via the external input coupling scheme of the associated coupled model. By using the information, it sends the input message to the subordinates. And, when a coordinator receives a (*,t) message, it routes the message to the subordinate associated with the imminent component DEVS. At this time, there are several subordinates which have the same next event time. In this case, since a sequential computer cannot execute the internal event processing of the subordinates in parallel, the select function of the associated coupled model is used to order the execution. By using the function, a coordinator chooses only one subordinate on receiving a (*,t) message. After sending the (*,t) message, a coordinator may receive some (y,t) messages as result of the message. When a coordinator receives the message, it determines whether or not the message is used internally. If it is, the associated
262
Y. R. Seong et al. JSimulation Practice and Theory 2 (1995) 257-275
coupled model transforms the (y,t) message to a (x,t) message by applying the i-to-j transition map and sends it to influencees. Otherwise, the coordinator routes the message to its parent coordinator. A coordinator synchronizes the execution of its subordinates when it receives (done, tN) messages from all active subordinates. Then, it sets its t, to the minimum of all subordinates’ t,‘s and sends a (done, tlv) to its parent coordinator. Finally, when the message reaches the topmost coordinator, the root coordinator, it schedules the t, of the entire system and then sends a (*,t) message to the imminent subordinate.
3. Parallel simulation of DEVS models Three synchronization schemes exist in parallel DEVS simulation; global simulation time synchronization, parallel external events synchronization, and parallel internal events synchronization. Each is related to the parallelization of DEVS simulation. The first type of synchronization is essential. In most parallel DEVS simulation researches, it is controlled by using the hierarchical scheduling algorithm described in the DEVS abstract simulators. In this case, since the algorithm manages the global simulation time in a strict manner, no parallelism is associated with the synchronization. For utilizing this type of parallelism, Christensen employed the Time Warp algorithm Cl]. The second form of synchronization can be exploited when the external events occurring simultaneously are processed in parallel. In DEVS, an atomic model produces output just before internal transition which causes external events. Therefore, the external events can be processed in parallel without generating any output. Many studies have focused on the parallelism. Conception introduced a hardware architecture for such parallel DEVS simulation, the hierarchical multibus multiprocessor architecture (HM’A) [2,3]. Zeigler and Zhang proposed a mapping algorithm for the parallel DEVS abstract simulator which parallelizes only external events by use of the hierarchical simulation algorithm [ 191. In the algorithm, the hierarchical structure of simulation models is conserved. The last synchronization scheme arises when more than one internal event scheduled at the same next event time are processed in parallel. In contrast to the case of external events, processing of internal events affects influencee models. Wang parallelized the execution of internal events on a massively parallel computer, the Connection Machine CM-2 [ 161. However, the approach is limited to non-hierarchical broadcast models because of incompatibility in mapping from hierarchical abstract simulators to SIMD architecture of CM-2. P-DEVSim+ + is an implementation of the DEVS formalism in a parallel environment [ 131. For solving the synchronization problems described earlier, P-DEVSim + + employs the following mechanisms. First of all, the global simulation clock is strictly ordered and synchronized by the inherent hierarchical scheduling algorithms in Figs. 1 and 2. The parallel execution of external events is synchronized by the hierarchical simulation algorithm, a form of conservative simulation algorithms; and that of internal events is synchronized by the PAR_INT algorithm [ 131, an optimistic simulation algorithm.
Y. R. Seong et al./Simulation Practice and Theory 2 (1995) 257-275
263
The PAR_INT algorithm executes internal events in parallel which are scheduled at the same time. In sequential simulation, the select function of coupled DEVS determines the execution order of models in the same next event time. The model selected from the function carries out its internal transition which in turn causes external events to its influencees [ 181. For example, consider that two atomic models have a common influencee and that their scheduled next event times are the same. When the scheduled time comes, the influencee would process two external event caused by two internal events. In sequential simulation, the internal events are processed in a fixed order by the select function. So, the influencee processes the external event messages in the order. In parallel simulation, however, the internal events may be processed in parallel. Therefore, the influencee of the two models receives two external event messages. Receiving times for the two messages in real time are not the same but depend on times spending for output generations for the two models and transmissions between the models and the influencee. Therefore, the influencee may respond to the received messages in an order different from the received order. For such ordering of the messages, PAR_INT employs an optimistic simulation approach. Each atomic model has priority from the select function and stamps it on each output message. By using such priority, an atomic model can order arrived external event messages. Generally, rollback and fossil collection problems arise in optimistic simulation. But, in the PAR_INT algorithm, since the global simulation time is managed in a strict manner, rollbacks are limited to the models that initiate it, and no explicit fossil collection mechanism is required. A benefit of such properties is less simulation time and smaller memory management overheads. Fig. 3 shows the modified abstract simulator algorithm for a simulator in P-DEVSim + + . Note that, now, an external event message and an output message are specified as (x,&p) and (y,t,p) in the algorithms. For optimistic simulation, the response of a simulator to receive an external event message is changed. The simulator rollbacks when p, the priority of the newly arrived message, is larger than pL, that of the last processed message. The operation undoes all execution of the external event messages in the simulator with its priority larger than p, and executes the newly arrived message and the rollbacked messages in the order of their priorities. But, if p is smaller than pL, the abstract simulator simply executes the new message. By the PAR_INT algorithm, while it processes an internal event message, a simulator copies its priority p* to the output messages which it produces. Fig. 4 shows the modified abstract simulator algorithm for a coordinator in P-DEVSim + + . The behavior of a coordinator on receiving a (x, t,p) message is nearly identical with Fig. 2. But, when a coordinator receives a (*,t) message, it reacts differently from the original algorithm. In the original algorithm, a coordinator chooses only one subordinate on receiving a (*,t) message by using the seZect function. But, the modified algorithm sends (*,t) messages to all imminent subordinates.
4. Mapping DEVS simulators onto hypercuhe A mapping establishes a relationship between a task graph, which specifies the relationship among subtasks, and a processor graph, which specifies the interconnec-
264
Y.R. Seong et al./Simulation
Practice and Theory 2 (1995) 257-275
Algorithm sim:when_rcv_(itx, t, p) 1 if tL
6 I
8 9 10 11
12 13 14 15 16
s: = 6,,,(s, e, x);
t,:=t; tN:=tL+ta(s); end for else PL:=P;
e: = t - tL; s: = 6,,,(s, e, x);
tL:= t; t,:= t,+ta(s); endif
send (done, t,,.)to the parent coordinator; 18 else 19 error; 20 end if 17
Algorithm sim:when_rcv_(*, t) 1 if t-t, then 2 y:=Iz(s); 3 send (y, t, p*) to the parent coordinator;
4 5 6
s: = Gin,(S); t,:=t; t,:=t,+ta(s);
7
send (done, tN) to the parent coordinator; 8 else 9 error; 10 end if
Fig. 3. Modified abstract simulator for atomic DEVS models.
tion of processors in a parallel environment. It is very important in parallel processing since it considerably affects on the performance and utilization of parallel systems. Generally, a mapping strategy relies on some constraints which are different from application to application. Such constraints can be formalized by objective functions [S]. In general parallel simulation, the objective function may be defined as the overall time for finishing the simulation. By minimizing the objective function, we can find an efficient mapping strategy for general parallel simulation. In this work, the processor graph is hypercube and the task graph is represented by composition trees. A composition tree specifies the hierarchical structure and I/O coupling of components of a DEVS model. In a composition tree, each leaf node represents an atomic model and each internal node represents a coupled model. Since an abstract simulator is associated with each model in an one-to-one manner,
Y. R Seong et al./Simulation
Practice and Theory 2 (1995) 257-275
Algorithm coor:when_rcv_(x, t, p) 1 if t L.
i;
Algorithm coor:when_rcv_(*, t) 1 if t=tN then 2 find the imminent simulators (those with minimum t,); 3 send the input (*i, t) to all the imminent simulators i; 4 send the signals (xi, j, t, p) to each of its influences j; 5 wait until the imminent simulators and each of their influences 6 t,:=t; 7 t,: =minimum of component t,s; 8 send (done, tN) to parent coordinator; 9 else 10 error; 11 endif
Fig. 4. Modified
abstract
simulator
for coupled
265
are done;
DEVS models.
the structure of abstract simulators is the same composition tree as in a coupled model. So, the composition tree is also a task graph for hierarchical DEVS simulation. Parallel DEVS simulation differs from general parallel simulation problems in two respects. First, when the global simulation time is strictly controlled by the root coordinator in DEVS, the degree of parallelism in DEVS simulation is limited to the parallel execution of external events occurring simultaneously or internal events scheduled at the same time. So, our mapping algorithm focuses on parallelizing external and internal events initiated at the same time. Second, the hierarchical structure of a DEVS model can be transformed into a different structure with the same behavior. Flattening and deepening operations can perform such a transformation [IS]. By using the operations, nodes of a composition tree can be clustered and split in a formal way which is impossible in general parallel simulation problems. As mentioned earlier, the global simulation time in P-DEVSim+ + is controlled by the hierarchical scheduling algorithm of DEVS. In the algorithm, the rootcoordinator, the topmost coordinator, controls the global simulation clock. It determines the minimum of the next event time of all components and generates an internal event message with the minimum next event time. The internal event may generate output messages which cause external events. We define a stage to be the interval during which an internal event and the associated external events are carried out. Now, the total simulation can be considered as a sequence of stages. Therefore,
266
Y. R. Seong et al./Simulation
Practice and Theory 2 (1995) 257-275
the objective function for parallel DEVS simulation can be defined as the sum of the overall finishing times on each stage. 4.1. Mapping of external events Let us assume that only one internal event is occurring in each stage. In processing an internal event, an atomic model generates output messages which cause external events of its influencee models. Such external events can be processed in parallel. In this case, the objective function for such a stage can be defined as the maximum finishing time of the processors assigned to the external events. For minimizing the objective function, the influencees may be mapped onto distinct processors to process the external events. So, the processor graph is a two level nary tree (Fig. 5(a); a set of tasks mapped onto the same processor is enclosed with a dotted curve). However, the messages would be sent in a sequential manner, so
(a) original composition tree
(b) after deepening
(c) after flattening Fig. 5. Depiction
of the mapping
of external
events.
Y. R. Seong et al.lSimulation Practice and Theory 2 (1995) 257-275
261
the influencees cannot begin to process the messages simultaneously. The time difference in receipt of the messages limits the degree of parallelism for handling external events. To shorten the difference, the deepening operation is employed. The operation inserts an internal node into a composition tree so that the level of a subtree in the composition tree is increased by one. As the hardware platform executing DEVS simulation has no multicasting capability, when a message has more than one destination, the message source should repeatedly send the message to the destinations one by one sequentially. In such a case, the deepening operation on the source is applied, which inserts subordinate nodes between the source and the destinations (Fig. 5(b)). The operation reduces the number of repetitions in message sending, thereby reducing communication overheads. By a series of deepenings, the task graph can be transformed to a binary tree. Meanwhile, after sending an external event message, a coordinator would be inactive awaiting a done message. Therefore, mapping an inserted node and one of its subordinates on the same processor would reduce the number of processors without loss of parallelism. Now, the processor graph is transformed to a binomial tree [ 151. A binomial tree B, is defined inductively as shown in Fig. 6. But, there is some unnecessary intraprocessor communication because the task graph does not match with the processor graph. For matching the graphs, the flattening operation is employed. The operation removes an internal node of a composition tree so that the level of a subtree of the composition tree is reduced by one (Fig. 5(c)). Now, the deepened composition tree can be transformed to a binomial tree. Thus, the task graph matches with the processor graph, which eliminates unnecessary intraprocessor communication overheads. 4.2. Mapping of internal events Now, let us discuss the case in which a set of internal events is associated with a stage. For simplicity, let us suppose that a simulators receive internal event messages at the same time, and that the ith simulator has /Ii influencees, respectively. When the internal events are processed in parallel, ZT=,/Ii external events are generated at
. :
Fig. 6. Definition
of binomial
trees
Y.R Seong et al.lSimulation Practice and Theory 2 (1995) 257-275
268
the same time. So, when mapping such internal events, we should consider the external events caused by such internal events to minimize overall computation time required to process all external events. The objective function is the maximum finishing time among the processors assigned to the internal and external events. Now, consider how we can minimize the time. In this case, the overall computation time is not inversely proportional to the number of processors due to synchronization overheads. Similar problems were investigated in the parallel processing domain [4,6]. The problem is to find a preemptive schedule which minimizes the overall finishing time for the following case; n independent tasks can be executed on a partitionable parallel environment with execution times varying with partition size (the number of processors used to execute the task). Preemptive schedule is a schedule in which a task which is ready to execute can preempt the task currently being executed. With preemptive schedule, therefore, no processor is blocked for synchronization among tasks. Du and Leung [4] shows that to find a minimum length preemptive schedule for the given problem is a strongly NP-hard problem for an arbitrary number of processors and tasks. For mapping such internal and external events, a simple hypercube partition algorithm is employed. Fig. 7 shows the proposed algorithm. Since a hypercube is partitionable and a binomial tree can be easily embedded into a hypercube, an internal event and the influenced external events are easily mapped onto the same subcube. In the proposed algorithm, during the initial mapping phase, given tasks are mapped to processors with their overall finishing time as even as possible. At this time, the overall finishing time to process the tasks is bounded to the maximum of each processor’s execution time. So, during the updating phase, a higher dimension subcube is assigned to the tasks which are initially mapped on the processor with the longest execution time. If the overall finishing time is reduced, the above procedure is repeated.
Algorithm
Mapping
D initial mapping 1 for each internal event 2 find a processor which has the minimum execution time; 3 map the internal event and the influenced external events onto the processor; 4 end for 2 update the mapping 5 while the overall finishing time is decreased the processor with the longest finishing time; 6 proc,,:= 7 unmap all external events mapped onto procmox during the initialization phase; 8 allocate a subcube including proc,,,ox with the dimension larger than that of the previous by one; 9 remap the unmapped external events onto the newly allocated subcube; 10 end while
Fig. 7. Mapping
algorithm.
subcube
Y. R. Seong et al./Simulation Practice and Theory 2 (1995) 257-275
269
Fig. 8 shows an example. The problem is to process two concurrent internal events on 2-D hypercube. Fig. 8(a) shows some parameters, such as the expected internal and external event processing time, the number of influencees, and the expected interprocessor communication time. Fig. 8(b) shows the time diagram after the initialization phase. In the figure, PROCO takes longer time to process the given task than PROCl. So, we allocate an 1-D subcube which includes PROCO. The task is processed on the subcube in parallel. The result is shown in Fig. 8(c). Now, PROCl has the longest execution time. So, we allocate an 1-D subcube including PROCl to the task (Fig. 8(d)). Finally, PROCO has the longest execution time. So, we allocate a 2-dimension subcube which includes PROCl to the task. But, the overall finishing time is no longer reduced and the algorithm is terminated.
Tc,,,,,,,,= 1, Nproc = 4
Tint Text,inf Nnf A2
2
8
B2
3
4
m
Tint,A
m
Text,A’sinf
a
Tcomm
m
Tint,&
m
Text,~‘s inf
(
NULL
(a) parameters
proc21 proc
:
I ,w
1 (c) time diagram II
(d) time diagram III Fig. 8. Depiction
of the mapping
of internal
events
Y. R. Seong et al. JSimulation
270
Practice and Theory 2 (1995) 257-275
5. Experiments
In this section, the proposed mapping algorithm is evaluated by simulation experiments with a simple model. It is a simple parallel computing center model (Fig. 9) [ 121. There are two parallel computer systems in the center: KAI860 and KAPAC. Each computer system model is composed of a buffer (BUFF), a synchronizer (SYNC), and 32 processing elements (PEs). The simulation scenario is given as follows: (i) GENR generates parallel processing tasks randomly; (ii) A task contains information of the expected processing time on each parallel machine; (iii) By using the information, a task is transmitted to the appropriate target machine; (iv) BUFF stores the transmitted task and sends the stored tasks to PE models in a first-comefirst-serve manner; (v) PEs process a task in the expected processing time of each task; (vi) After the time elapses, the result of PEs is collected by SYNC; (vii) SYNC transmits a task to its next target parallel machine; (viii) If processing of a task is entirely completed, the processing statistics of the task is collected by TRANSD. The composition tree of the above model is shown in Fig. 10. The tree is transformed and mapped onto a hypercube network by the proposed mapping algorithm. In simulation of the example system, the most prominent bottleneck is to process external events in KAI860-PE and KAPAC-PE models. The external input events are caused by internal events of KAI860-BUFF and KAPAC-BUFF models, which
Fig. 9. Simple parallel
Fig. 10. Composition
computing
tree of the simple parallel
center model.
computing
center model.
211
Y. R. Seong et al,/Simulation Practice and Theory 2 (1995) 257-275
can be initiated at the same time. Therefore, our mapping algorithm is focused on parallelization of both the internal and external events. Table 1 shows the parameters used in this experiment. All the time statistical parameters in Table 1 are arbitrarily chosen. For real applications, however, such parameters would be measured via prototypical simulation. For example, in the KAICUBE-860 hypercube multicomputer system, the interprocessor communication time for a several hundred bytes message is about 0.5 msec; and the external event processing time for a simple atomic model is about 4 msec. The input parameters for this experiment are execution time of an external event in PE models Text,PE,the dimension of the hypercube, and the interprocessor communication time r,,,. represent the computation load and the communication Indeed, Lt,P~ and L,, overhead of the simulation, respectively. The intraprocessor communication time is assumed to be zero. We divide the experiment into two cases. In experiment I, Text,PE is set to 1 and T,,,, and the dimension of the hypercube are varied as shown in Table 2. And, in experiment II, T,,,, is set to 1 and zxt,PE and the dimension of the hypercube are varied as shown in Table 3. For each combination of input parameters, our mapping algorithm generates the mapping result as shown in the fourth columns of Tables 2, 3 and Fig. 11. In MAP0 and MAP& the structure of the composition tree is not changed since the number of the processors used is smaller than that of parallelizable internal events. In MAP2 Table 1 Experimental design - Parameters Models Coupled models PE models Other atomic models
Table 2 Experiment I Ttw,PE
DIM
Tc0mm
MAP
Tf
1 1
0
1 1 1 1 4 4 4 4 8 8 8 8
MAP0 MAP1 MAP2 MAP3 MAP0 MAP1 MAP2 MAP2 MAP0 MAP1 MAP1 MAP1
4268 2115 1971 1891 4268 3315 3132 3132 4268 4035 4035 4035
1 2 3 0 1 2 3 0 1 2 3
272
Y. R. Seong et al.lSimulation Practice and Theory 2 (1995) 257-275
Table 3 Experiment II T&vl,PE
DIM
Twmm
MAP
Tr
1 1 1 1 4 4 4 4 8 8 8 8
0 1 2 3 0 1 2 3 0 1 2 3
1 1 1 1 1 1 1 1 1 1 1 1
MAP0 MAP1 MAP2 MAP3 MAP0 MAP1 MAP2 MAP3 MAP0 MAP1 MAP2 MAP3
4268 2775 1971 1891 11948 7479 4324 2995 22188 13571 7459 4568
and MAP4, however, the hierarchical structure of KA1860 and KAPAC subtrees are transformed to binomial trees via the flattening and deepening operations. The mapping results are notified to P-DEVSim+ + in simulation initialization phase. Also, no models are migrated dynamically. For performance evaluation, a performance simulator of P-DEVSim+ + was already developed [ 111. The simulator simulates the runtime behavior of P-DEVSim+ +. To analyze the runtime behavior of P-DEVSim+ + in a real simulation environment is very difficult because most of the parameters which affect the runtime behavior are uncontrollable in the environment. But we can control the parameters of the performance simulator easily. The performance simulator of P-DEVSim+ + is modeled by the DEVS formalism and implemented by P-DEVSim + + itself. By using the performance simulator, the transformed simulation task graphs are simulated. The simulation result of experiment I is shown in the fifth column of Table 2 and Fig. 12(a). In experiment I, as increasing the dimension of hypercube, the overall finishing time is reduced. But, in this case, communication overhead is relatively larger than computation load. So, the speedup is not so good. Especially, despite the increased dimension of the hypercube, when T,,,, is larger than Text,PE, the number of processors utilized by the proposed mapping algorithm is not changed (Table 2). The reason is that the mapping algorithm decides to limit the parallelism of DEVS simulation due to large communication overhead. For validating the decision made by the algorithm, we forcibly used the entire hypercube network on such cases. The result is depicted with the dotted lines in Fig. 12(a). As shown in the figure, the overall finishing time increases as the dimension of the hypercube is increased in such simulation. So, we can confirm that the decision is valid. The simulation result of experiment II is shown in the fifth column of Table 3 and Fig. 12(b). In experiment II, communication overhead is relatively smaller than computation load in contrast to experiment I. So, the overall finishing time is drastically decreased as the dimension of the hypercube increases. As Text,PE is
Y. R Seong et al./Simulation Practice and Theory 2 (1995) 257-275
(a) MAP0
(b) MAP1
(c) MAP2
(d) MAP3
Fig. 11. Mapping
results.
213
274
Y. R Seong et al./Simulation Practice and Theory 2 (1995) 257-275
2000 1500 0
I
I
1
2
3 Dimension
(a) Experiment
I
s 25000 i
5000 -
O0
1 ) r I
I
1
2
3 Dimension
(b) Experiment Fig. 12. Simulation
II results.
increased, more speedup is acquired. Especially, when Text,PEITfomm = 8, the overall finishing time is reduced to about 20% with a 3-dimensional hypercube.
6. Conclusion This paper has proposed a mapping algorithm for processing external and internal events of DEVS models with minimal synchronization overhead. Flattening and deepening operations are employed for mapping external events. By using the opera-
Y. R. Seong et al./Simulation Practice and Theory 2 (1995) 257-275
275
tions, a task graph of DEVS simulation is transformed to a binomial tree which a hypercube network can embed. For mapping internal events, a hypercube is partitioned into a set of subcubes such that an internal event and the external events, caused by the internal event, are processed in the same subcube. The performance of the proposed algorithm is evaluated by simulation experimentation. The result shows that the algorithm outputs various mappings with varying communication and computation times. Also it utilizes the significant parallelism of distributed simulation of hierarchical, distributed discrete event models. The analysis of the proposed algorithm is remained as further work.
References [ 11 E.R. Christensen, Hierarchical Optimistic Distributed Simulation: Combining DEVS and Time Warp, Ph.D. Thesis, University of Arizona, USA, 1990. [2] AI. Conception, Distributed simulation on multiprocessor: Specification, design and architecture, Ph.D. Thesis, Wayne State University, USA, 1985. [3] A.I. Conception, A hierarchical computer architecture for distributed simulation, IEEE Trans. Comput. 38 (2) (1989) 311-319. [4] J. Du and J. Y.-T. Leung, Complexity of scheduling parallel task systems, SIAM J. Discrete Math. 2 (4) (1989) 473-487. [S] R.M. Fujimoto, Optimistic approaches to parallel discrete event simulation, Trans. Sot. Comput. Simulation 7 (2) (1990) 153-191. [6] R. Krishnamurti and B. Narahari, Preemptive scheduling of independent jobs on partitionable parallel architecture, in: Proceedings of the International Conference on Parallel Processing (1992). [7] T.G. Kim and S.B. Park, The DEVS formalism: Hierarchical modular systems specification in C + +, in: Proceedings in European Simulation Multiconference ( 1992). [8] S.Y. Lee and J.K. Aggarwal, A mapping strategy for parallel processing, ZEEE Trans. Comput. 36 (4) (1987) 433-442. [9] Y.B. Lin and E.D. Lazowska, A study of Time Warp rollback mechanisms, ACM Trans. ModeRing Comput. Simulations 1 (1) (1991) 51-72. [lo] J. Misra, Distributed discrete-event simulation, ACM Comput. Suroeys 18 (3) (1986) 39-65. [ll] Y.R. Seong, T.G. Kim and K.H. Park, Performance evaluation of a parallel DEVS simulation environment P-DEVSIM+ +, J. Korea Sot. Simulation 2 (1) (1993) 31-45. [12] Y.R. Seong, T.G. Kim and K.H. Park, Mapping modular, hierarchical discrete event models in a hypercube multicomputer, in: Proceedings of the International Conference on Massively Parallel Processing Applications and Development, Delft, The Netherlands (1994). [13] Y.R. Seong, S.H. Jung, T.G. Kim and K.H. Park, Parallel simulation of hierarchical modular DEVS models: A modified Time Warp approach, Internat. J. Comput. Simulation, to appear. [14] R.C. de Vries, Reducing null messages in Misra’s distributed discrete event simulation method, IEEE Trans. Software Engineering 16 (1) (1990) 82-91. [ 151 J. Vuillemin, A data structure for manipulating priority queue, Comm. ACM 21 (4) (1978) 309-315. [ 161 Y.H. Wang, Discrete-event simulation on a massively parallel computer, Ph.D. Thesis, University of Arizona, USA, 1992. [ 171 B.P. Zeigler, Theory of Modelling and Simulation (John Wiley, New York, 1976). [18] B.P. Zeigler, Multifacetted Modelling and Discrete Event Simulation (Academic Press, New York, 1984). [ 193 B.P. Zeigler and G. Zhang, Mapping hierarchical discrete event models to multiprocessor systems: Concepts, algorithm, and simulation, J. Parallel Distributed Comput. 10 (7) (1990) 271-281.