disaggregation algorithm for hierarchical Markovian models

disaggregation algorithm for hierarchical Markovian models

European Journal of Operational Research 116 (1999) 545±564 Theory and Methodology An adaptive aggregation/disaggregation algorithm for hierarchical...

211KB Sizes 1 Downloads 73 Views

European Journal of Operational Research 116 (1999) 545±564

Theory and Methodology

An adaptive aggregation/disaggregation algorithm for hierarchical Markovian models Peter Buchholz

1

Informatik IV, Universit at Dortmund, D-44221 Dortmund, Germany Received 1 May 1997; accepted 1 February 1998

Abstract A new analysis technique for large continuous time Markov chains resulting from hierarchical models speci®ed as stochastic Petri nets or queueing networks is introduced. The technique combines iterative solution techniques based on a compact representation of the generator matrix, as recently developed for di€erent modeling paradigms, with ideas from aggregation/disaggregation and multilevel algorithms. The basic step is to accelerate convergence of iterative techniques by integrating aggregation steps according to the structure of the transition matrix which is de®ned by the model structure. Aggregation is adaptive analyzing aggregated models only for those parts where the error is estimated to be high. In this way, the new approach allows the memory and time ecient analysis of very large models which cannot be analyzed with standard means. Ó 1999 Elsevier Science B.V. All rights reserved. Keywords: Markov processes; Hierarchical models; Steady state analysis; Iterative techniques; Aggregation/disaggregation

1. Introduction Performance, reliability and performability of computer and communication systems is often determined by modeling the systems using extended queueing networks, stochastic Petri nets or related modeling formalisms. The resulting models usually specify a continuous time Markov chain (CTMC), which is analyzed according to its steady state distribution from which required results can be computed. Although the computation of the steady state distribution of a CTMC is conceptually simple for ®nite state spaces, in practice, resulting state spaces are often ®nite, but extremely huge such that analysis requires a lot of time and space. A wide variety of sophisticated analysis algorithms and data structures have been developed for an ecient analysis of CTMCs, but often these approaches reach their limits when analyzing realistic examples.

1

E-mail: [email protected]

0377-2217/99/$ ± see front matter Ó 1999 Elsevier Science B.V. All rights reserved. PII: S 0 3 7 7 - 2 2 1 7 ( 9 8 ) 0 0 0 8 8 - 5

546

P. Buchholz / European Journal of Operational Research 116 (1999) 545±564

One approach to deal with fairly large CTMCs, which has been developed during the last decade, is to exploit the model structure for the representation of the generator matrix of a CTMC in a compact form. This approach ®rst has been published for networks of communicating stochastic automata in [17] and has subsequently been extended to hierarchical Markovian models with asynchronous communication [6,7]. The compact representation of the generator matrix underlying the models can be directly exploited in iterative solution techniques. We denote related solution approaches as structured analysis approaches. The main step is to realize the vector matrix multiplication, which is the basic operation of most iterative solution techniques for CTMCs, directly on the compact representation of the generator matrix. In this way, storage requirements are reduced drastically such that the state space size of solvable models on a given hardware is increased by around an order of magnitude (see the examples in [6,7]). The number of required operations to perform a single vector matrix product is also modi®ed as shown in [17,20,7]. However, a reduction in the number of operations can only be achieved when the generator matrix is relatively dense, but generator matrices are usually very sparse. Thus, the number of operations is most times not reduced by structured solution techniques. Several analysis algorithms have been used in the context of structured analysis. The ®rst algorithm proposed in [17] was the Power method, later projection techniques [20], Jacobi overrelaxation (JOR), a modi®ed version of successive overrelaxation (SOR) possibly combined with aggregation/disaggregation steps [6,7] have been applied for structured analysis. All these solution techniques are known from conventional CTMC analysis. In the structured approaches they are only based on a di€erent form to multiply the iteration vector with the iteration matrix. In summary, structured analysis allows extremely large CTMCs to be handled on standard workstation, but the time to compute the solution with an acceptable accuracy is often very long. Thus, the major goal is to speed up the solution of large CTMCs by using more advanced techniques. Such techniques can be developed by exploiting the structure of the model not only to represent the generator matrix in a compact way. Structure can additionally be used to de®ne several aggregated representations of the CTMC which coincide to models where submodels have been represented by aggregates. These aggregated CTMCs are much smaller than the original one and can therefore be solved much faster. The idea is to solve the aggregated CTMCs and use the solution to redirect the solution vector of the original CTMC. Iteration of the complete CTMC and solution of aggregated systems can be interleaved yielding a very ecient analysis algorithm. This approach is in some sense similar to algebraic multigrid methods [5,18]. However, the concrete realization of the approach is di€erent, since matrix structures are di€erent and the method introduced here is based on the tensor structure of generator matrices as it appears in structured analysis approaches. The approach is conceptually also related to aggregation/disaggregation algorithms [20,19] and the multi-level algorithm for CTMC analysis [14,15]. However, these approaches use the complete generator matrix of a CTMC and aggregate states according to a di€erent aggregation criterion. The paper is organized in the following way. In Section 2, the hierarchical model class is introduced. Then matrix structures underlying hierarchical models are presented. In Section 4, iterative analysis of hierarchical models is brie¯y reviewed, afterwards the concept of aggregation disaggregation is presented in the context of hierarchical models. Section 6 contains the complete adaptive aggregation/disaggregation algorithm. The advantages of the new algorithm are shown by some non-trivial examples subsequently. Section 8 concludes the paper. Presentation of the whole approach is accompanied by a simple running example. 2. Hierarchical Markovian models The class of hierarchical Markovian models for which structured analysis approaches have been developed is rather general combining queueing networks (QNs) and colored generalized stochastic Petri nets

P. Buchholz / European Journal of Operational Research 116 (1999) 545±564

547

(GSPNs). We do not consider here the description of hierarchical models. However, for many QN or GSPN models it is quite natural to represent them in a hierarchical way by de®ning submodels that interact by exchanging customers/tokens. Especially in QNs the de®nition of submodels and their exploitation for a more ecient analysis is well established for a long time (see e.g., [7,12,20]). More recently also for GSPNs hierarchical descriptions have been introduced (see e.g., [6,16]). The algorithm proposed here has been implemented in the QPN-Tool [3] which supports a general class of hierarchical models. Here we introduce the approach only for a limited subclass of models, since the goal of this paper is to introduce the new algorithm rather than a comprehensive class of hierarchical models. Nevertheless, it is worth mentioning that an extension of the model class is straightforward and the algorithm has indeed been implemented for more general models. In this paper, we consider hierarchical models which can be interpreted as a form of extended hierarchical queueing networks as proposed in [7]. We assume a two-level hierarchy, where J low level models (LLMs) are connected in a high level model (HLM). The model is closed and the HLM contains customers from K classes numbered 1 through K. The HLM speci®es only the connection of LLMs by means of routing probabilities. LLMs are restricted in their input/output behavior. We assume that customers are not generated or destroyed in a LLM and that a customer will eventually leave the LLM after entering it. Furthermore, only single customers leave a LLM and customers cannot pass in zero time through a LLM. The resulting model class is similar to generalized service networks as proposed in [16]. We do not care about the internal speci®cation of a LLM as long as it meets the conditions for the input/output behavior. Thus, LLMs may be speci®ed as (extended) queueing networks or as colored GSPNs. The state of the HLM can be described by a J  K integer vector n which contains in position nj …k† the number of class k customers in LLM j. Each vector nj de®nes a macro state for LLM j. We use the notation ei;k for a state with only one class k customer in LLM i. E.g., n  ej;k is the state resulting from n by adding/ removing one class k customer to/from LLM j. Similarly we use the notation nj  ek for the state of LLM j which results from nj by adding/removing one class k customer. Routing in the HLM is speci®ed by means of (state dependent) routing probabilities; r…i; k; j; l; n† (1 6 i, j 6 J , 1 6 k, l 6 K) is the probability that a class k customer leaving LLM i will enter immediately LLM j as class l when the state of the HLM is n before the customer leaves. State n0 is a successor of n if n ÿ n0 ˆ ei;k ÿ ej;l and r…i; k; j; l; n† > 0. Starting with an initial state n, the state space of the HLM can be generated by computing the transitive closure of the successor relation. We denote the resulting state space by S0 . Furthermore, we de®ne Zj , the macro state space of LLM j as follows: Zj ˆ fnj j n 2 S0 g: Generation of the HLM state space and transition relation implicitly generates also possible arrivals to LLMs. Thus, a class k customer can arrive to LLM j in macro state nj , if nj ‡ ek 2 Z0 . We denote by Aj …nj † the set of classes which can potentially arrive to LLM j in macro state nj , i.e., Aj …nj † ˆ fk j 1 6 k 6 K and nj ‡ ek 2 Zj g: The set of classes which visit LLM j is de®ned as Kj ˆ fk j k 2 f1; . . . ; Kg and nj 2 Zj with nj …k† > 0 existsg: A LLM now can be de®ned straightforwardly considering the restrictions mentioned above. We brie¯y summarize the description of LLMs as extended queueing networks and (colored) GSPNs (for more details see [6,7] and the examples in Section 7), respectively. · A queueing network LLM is described as an open or a mixed network of queues. The queueing network includes a set of Kj open customer chains (observe that the classes in the HLM become chains in the LLMs). Customers from this set are generated by state dependent sources with an exponential interar-

548

P. Buchholz / European Journal of Operational Research 116 (1999) 545±564

rival time with rate 1:0. Sources and source transitions in the GSPN case are used to mimic the environment of the LLM. The transition rate is a pseudo-rate which is only used to generate conditional arrival probabilities. A macro state of the LLM is described by the customer population in the chains Kj . An arrival of a chain k customer in state nj is possible if k 2 Aj …nj †. If k 62 Aj …nj †, then class k cannot arrive and the corresponding source is blocked. Customers from chains Kj are eventually absorbed in a sink. Additionally, the LLM may include a ®xed population of internal customers belonging to chains that are not in Kj . · A (colored) GSPN LLM is speci®ed using the model class de®ned in [1] or [10]. We describe here one possibility for the speci®cation of LLMs using uncolored GSPNs. The net for LLM j includes for each k 2 Kj an environment transition place pair. Environment transitions are timed with transition rate 1. The only input place for the environment transition for k 2 Kj is the corresponding environment place and the arc cardinality of the input arc from the environment place to the environment transition is 1. All input arcs of environment places have also arc cardinality one and only timed transitions are input transitions for environment places. For each k 2 Kj a place invariant exists that contains the environment place of k and no other environment place. Furthermore, a transition invariant exists that includes the environment transition for k and no other environment transition. The initial marking of the GSPN describing a LLM is chosen according to the population determined in the HLM state space. The macro state is de®ned according to the place invariants corresponding to the customer classes. In this way, some sort of a population can be de®ned for GSPNs, which does not exclude synchronization or fork and join inside a subnet. Environment transitions can be extended by an additional guard, if necessary, to realize the arrival process determined in the HLM. From a queueing network or GSPN speci®cation of a LLM the state space and transition matrices can be generated by standard means. Let Sj be the state space for LLM j and denote by Sj ‰nj Š the subset of states belonging to macro state nj 2 Zj , i.e., the set of states with population nj in the LLM. Let sj …nj † be the number of states in Sj ‰nj Š. States are represented by consecutive number from 0 to sj …nj † ÿ 1. Transitions of LLMs can be characterized by di€erent matrices which are introduced now (see also [6,9]). We use bold capital letter for matrices. Subscript j describes the LLM to which a matrix belongs. Square brackets are used to select speci®c matrices from a set of matrices, whereas normal brackets are used to select elements of a matrix or vector. i.e., Aj ‰a; bŠ…x; y† indicates element x; y from matrix a; b belonging to LLM j. Internal transitions of a LLM are collected in matrices Qj ‰nj Š, which are sj …nj †  sj …nj † matrices including in position Qj ‰nj Š…x; y† (0 6 x; y 6 sj …nj † ÿ 1) the transition rate of an internal transition between the states x and y from Sj ‰nj Š. The departure of a class k 2 Kj customer from LLM j with population nj requires nj …k† > 0 and describes a transition into some state in Sj ‰nj ÿ ek Š. Corresponding transition rates are collected in a sj …nj †  sj …nj ÿ ek † matrix Sj ‰nj ; kŠ. Both matrix types include transitions originated by LLM j. The diagonal elements of the matrices Qj ‰nj Š are de®ned as the negative sum of transition rates originated in the LLM. They can be computed as 0 Qj ‰nj Š…x; x† ˆ ÿ@

sj X …nj †ÿ1

Qj ‰nj Š…x; y† ‡

yˆ0; y6ˆx

X

sj …nX j ÿek †ÿ1

k2Kj ; nj …k†>0

1

Sj ‰nj ; kŠ…x; y†A

yˆ0

for all nj 2 Zj and x 2 Sj ‰nj Š. The introduced matrices include all transition rates originated in a LLM. Additionally, the behavior of a LLM upon arrival of a customer has to be described. The corresponding transitions can be interpreted as passive since the LLM reacts only, the arrival is initiated by the environment (i.e., by some other LLM). Arrivals of class k customers to LLM j in macro state nj are collected in the sj …nj †  sj …nj ‡ ek † matrix

P. Buchholz / European Journal of Operational Research 116 (1999) 545±564

549

Uj ‰nj ; kŠ. These matrices are de®ned for all nj with nj ‡ ek 2 Zj and include only non-negative elements and unit row sums. Element Uj ‰nj ; kŠ…x; y† includes the probability that the state of the LLM changes to y, when in state x a class k customer arrives. All matrices for a LLM can be generated from the model speci®cation as an extended queueing network or GSPN using an appropriate tool. Standard algorithms for state space and matrix generation have to be extended only slightly to generate the isolated matrices. The di€erent matrices describe the behavior of the LLM completely, dependency on the environment is restricted to the de®nition of possible arrival sequences of customers. Example 1. We explain the approach by means of a simple running example. The model is a two class QN as shown in Fig. 1. The QN contains three LLMs. Customers of class 1 visit LLM 1 and 2. Customers of class 2 visit LLMs 1 and 3. LLMs 2 and 3 include two single server queues with exponential service time distribution and service rates li;j for queue j in LLM i. The service discipline at both queues is First Come First Served (FCFS). Customers arriving to LLM i (i ˆ 2; 3) enter queue 1 immediately. After leaving queue 1 they enter with probability pi queue 2 and leave with probability pi ˆ 1 ÿ pi the LLM. Customers leaving queue 2 enter queue 1 again. LLM 1 consists of a single server queue with Erlang 2 service time ÿ1 distribution and mean service time …2kk † for class k. The queue in LLM 1 uses FCFS service among the customer of one class and switches among classes when the number of customers of the currently served class becomes 0. Thus, if class k is momentarily served, then all class k customers are served, including newly arriving customers of class k. If no more class k customers are waiting, service switches to class l (6ˆ k) and class l customers are served until no more class l customers are waiting. Since the LLMs are relatively simple, we present some of the matrices in detail. We start with HLM state space S0 and assume that the model contains nk customers of class k. S0 contains …n1 ‡ 1†…n2 ‡ 1† states of the form ……m1 ; m2 †; …n1 ÿ m1 ; 0†; …0; n2 ÿ m2 †† with 0 6 mk 6 nk . Subvectors nj are separated by brackets. S1 ‰m1 …1†; m1 …2†Š contains one state for m1 …1† ˆ m1 …2† ˆ 0. It contains two states if m1 …1† or m1 …2† is 0. In this case the ®rst state describes the situation that a customer is in phase 1 of the service time distribution. For m1 …1†; m1 …2† > 0, S1 ‰m1 …1†; m1 …2†Š includes four states. In the ®rst two states, a class 1 customers is served in phase 1 and 2, respectively. In the third and fourth state, a class 2 customer is served. For the LLMs j ˆ 2; 3, Sj ‰mj Š contains mj …j ÿ 1† ‡ 1 states. In state xj 2 Sj ‰mŠ (0 6 xj 6 mj …j ÿ 1†, xj customers are in queue 1 and mj …j ÿ 1† ÿ xj in queue 2. We present some of the matrices describing the LLMs and start with LLM 1. Consider ®rst the Qmatrices and assume m1 ; m2 > 0.

Fig. 1. Basic example model.

550

P. Buchholz / European Journal of Operational Research 116 (1999) 545±564

0

ÿk1

B B ± Q1 ‰…m1 ; m2 †Š ˆ B B ± @

±

k1

±

ÿk1

±

±

ÿk2

±

±

1

±

C ± C C; k2 C A

Q1 ‰…m1 ; 0†Š ˆ

!

ÿk1

k1

±

ÿk1

Q1 ‰…0; 0†Š ˆ … ± †:

;

ÿk2

Q1 ‰…0; m2 †Š equals Q1 ‰…m1 ; 0†Š except that transition rates k2 instead of k1 . As examples for the S-matrices we present the following cases, where we assume m1 ; m2 > 1: 0

±

±

B B k1 S1 ‰…m1 ; m2 †; 1Š ˆ S1 ‰…m1 ; 1†; 1ŠB B± @

±

B B± S1 ‰…m1 ; m2 †; 2Š ˆ S1 ‰…1; m2 †; 2Š ˆ B B± @ ±

1

±

C ± ±C C; ± ±C A

±

± ±

±

± 0

± ±

±

S1 ‰…m1 ; 0†; 1Š ˆ

±

±

±

±

±

C ±C C; ±C A

±

k2

±

±

k1

±

0

1

±

±

±

B B k1 S1 ‰…1; m2 †; 1Š ˆ B B± @ ±

! ;

±

1

C ±C C: ±C A ±

The remaining S-matrices are similar. Some examples for U-matrices including the conditional arrival probabilities are presented now. We assume m1 ; m2 > 0, k 2 f1; 2g and use the notation In for the n  n identity matrix: ! ± ± 1 ± : U1 ‰…m1 ; 0†; 1Š ˆ U1 ‰…0; m2 †; 2Š ˆ I2 ; U1 ‰…0; m2 †; 1Š ˆ U1 ‰…m1 ; m2 †; kŠ ˆ I4 ; ± ± ± 1 Since the LLMs 2 and 3 are structurally identical, it is sucient to consider only one. We show some matrices for LLM 2, the corresponding matrices for LLM 3 di€er only in the parameter values. Matrix Q2 ‰…m1 ; 0†Š is a …m1 ‡ 1†  …m1 ‡ 1† matrix: 0

B B p2 l2;1 B B Q2 ‰…m1 ; 0†Š ˆ B B B @

1

l22

ÿl2;2

ÿl2;1 ÿ l2;2

l22

...

...

...

p2 l2;1

ÿl2;1 ÿ l2;2

l22

p2 l2;1

ÿl2;1

C C C C C: C C A

S2 ‰…m1 ; 0†; 1Š and U2 ‰…m1 ; 0†; 1Š are …m1 ‡ 1†  m1 and …m1 ‡ 1†  …m1 ‡ 2† matrices, respectively: 0

±

B B p2 l2;1 B B S2 ‰…m1 ; 0†; 1Š ˆ B B B @





±

±



±

...

... ...

± p2 l2;1

1 C C C C C; C C A

0 B U2 ‰…m1 ; 0†; 1Š ˆ @

±

1

1

C A:

... ... ±

1

P. Buchholz / European Journal of Operational Research 116 (1999) 545±564

551

3. Matrix structures of hierarchical models In the previous section, isolated state spaces and matrices have been generated for the LLMs and the HLM. These parts are sucient to characterize the state space and generator matrix of the CTMC underlying the complete model. We start with the description of the state space. Each state of the complete model describes a unique state of the HLM and a unique state for each LLM. States cannot be arbitrarily mixed since a HLM state determines the macro state of each LLM. Let S be the state space of the complete model which can be generated as Sˆ

[

S‰nŠ ˆ

n2S0

[ n2S0

J

 Sj ‰nj Š:

jˆ1

P QJ P The number of states in S equals jSj ˆ n2S0 s…n† ˆ n2S0 jˆ1 sj …nj † and grows very rapidly with an increasing size of the subspaces Sj ‰nj Š and an increasing number of LLMs. Each state in S characterizes J ‡ 1 local states in the LLMs and the HLM. Denote by xj 2 Sj ‰nj Š the state corresponding to state x 2 S‰nŠ. States from S‰nŠ and Sj ‰nj Š can be related by a mixed radix number representation [13] which allows the linearization of a J -dimensional state description. The following relation between x 2 S‰nŠ and all xj 2 Sj ‰nj Š holds: xˆ

J J Y X xj si …ni †: jˆ1

…1†

iˆj‡1

From the relation, x can be computed knowing all xj and all xj can be computed knowing x. Consequently, every state receives two characterizations. A J -dimensional description re¯ecting the model structure and a two-dimensional number which describes its location in the generator matrix and is later used to implement iterative solution techniques. The state space of the complete CTMC is generated by combining state spaces of LLMs using cross products. In a similar way, the generator matrix for the complete model can be generated by combining LLM matrices via tensor (Kronecker) products and sums. First, these matrix operations will be brie¯y introduced, for complementary information we refer to [13,17]. De®nition 1. The tensor (Kronecker) product of two matrices A 2 Rr1 c1 and B 2 Rr2 c2 is de®ned as a r1 r2  c1 c2 matrix: 0 1 A…0; 0†B  A…0; c1 ÿ 1†B B C .. .. C: CˆA BˆB . ... . @ A A…r1 ÿ 1; 0†B    A…r1 ÿ 1; c1 ÿ 1†B The tensor (Kronecker) sum of two matrices E 2 Rr1 r1 and F 2 Rr2 r2 is de®ned as G ˆ E  F ˆ E Ir2 ‡ Ir1 F; where In is the n  n identity matrix. Tensor products and sums are associative [13], therefore generalization to more than two matrices is de®ned as     J J J J and  Aj ˆ A1   Aj ;

Aj ˆ A1 Aj jˆ1

jˆ2

jˆ1

jˆ2

552

P. Buchholz / European Journal of Operational Research 116 (1999) 545±564

where JjˆJ Aj ˆ JjˆJ Aj ˆ AJ . Furthermore, a relation between tensor products/sums and ordinary matrix products/sums exists, as shown in the following equation: J

Aj ˆ

jˆ1

J Y IQ

Aj I Q J jÿ1 jˆ1

ci

ri

and

iˆj‡1

iˆ1

J

 Aj ˆ

jˆ1

J X jˆ1

IQ

Aj I Q ; J jÿ1 ci

iˆ1

…2†

ri

iˆj‡1

where Ai is a ci  ri matrix and ri ˆ ci for tensor sums. This relation is the basic representation for iterative numerical methods exploiting the matrix structure. Following [7,9] the generator matrix Q of the CTMC underlying the complete model can be decomposed into submatrices or blocks Q‰n; mŠ (n; m 2 S0 ) according to the state of the HLM. We assume that states in S0 are ordered lexicographically and ei;k < ej;l if i < j or i ˆ j and k < l. Each Q‰n; mŠ describes a submatrix of the generator matrix, i.e., 0 1 Q‰nÿ ; nÿ Š . . . Q‰nÿ ; n‡ Š B C .. .. C; QˆB . ... . @ A Q‰n‡ ; nÿ Š . . . Q‰n‡ ; n‡ Š where nÿ and n‡ are the lexicographically smallest and largest element in S0 , respectively. A submatrix Q‰n; mŠ can be computed from tensor products/sums of LLM matrices as shown in the following equation [7]: Q‰n; mŠ 8 P P > J Q ‰n Š ‡ Jjˆ1 Kkˆ1 r…j; k; j; k; n†Ilj …n† Sj ‰nj ; kŠUj ‰nj ÿ ek ; kŠ Iuj …n† > > jˆ1 j j > > > r…i; k; j; l; n†Ili …m† Si ‰ni ; kŠ Ilj …n†ÿli‡1 …n† Uj ‰nj ; lŠ Iuj …n† > > > > > > > > < r…i; k; j; l; n†I lj …m† Uj ‰nj ; lŠ Ili …n†ÿlj‡1 …n† Si ‰ni ; kŠ Iui …n† ˆ > > > > > > r…i; k; j; l; n†Ili …m† Si ‰ni ; kŠUi ‰ni ÿ ek ; lŠ Iui …n† > > > > > > > > : 0

for n ˆ m; for n ˆ m ‡ ei;k ÿ ej;l and i < j; for n ˆ m ‡ ei;k ÿ ej;l and i > j; for n ˆ m ‡ ei;k ÿ ej;l ; i ˆ j and l 6ˆ k; otherwise; …3†

Qjÿ1

QJ

where lj …n† ˆ iˆ1 si …ni † and uj …n† ˆ iˆj‡1 si …ni †: This representation coincides with the ordering of states in the sets S‰nŠ given in Eq. (1). Using a slightly more abstract view, each submatrix Q‰n; mŠ can be represented as Q‰n; mŠ ˆ

TX …n;m† iˆ1

J

ri …n; m† Aij ‰n; mŠ; jˆ1

where each Aij ‰n; mŠ is a matrix belonging to submodel j or an appropriate identity matrix, ri …n; m† is a real constant and T …n; m† is an integer constant. We use this representation for the introduction of the algorithms, however, it is just a short hand notation for Eq. (3). If the indices of submodels are permuted, the resulting matrix describes up to the ordering of states the original CTMC. Since rotations of submodel indices will be used extensively to realize the new a/d algorithm, we consider the underlying concept here brie¯y. Let rj be the circular left shift by j ÿ 1 positions, i.e., submodel j becomes the ®rst and j ÿ 1 the J th submodel. Let x be a state from the original state space S‰nj Š and let xrj be the corresponding state after applying rj :

P. Buchholz / European Journal of Operational Research 116 (1999) 545±564

xrk ˆ

J X xj jˆ1

Y

553

si …ni †:

i2f1;...;J g;rk …i†>j

QJ QJ Denote by Prj ‰nŠ a jˆ1 sj …nj †  jˆ1 sj …nj † permutation matrix with Prj ‰nŠ…x; xrj † ˆ 1. After applying rj matrix Q‰n; n0 Š becomes T

Qrj ‰n; mŠ ˆ …Prj ‰nŠ† Q‰n; mŠPrj ‰mŠ ˆ

TX …n;m† iˆ1

J

jÿ1

kˆj

kˆ1

ri …n; m† Aik ‰n; mŠ Aik ‰n; mŠ:

…4†

Example 1 (continued). Subsets S‰nŠ contains s…n† ˆ …d…n1 …1† > 0† ‡ 1†…d…n1 …2† > 0† ‡ 1†…n2 …1† ‡ 1†…n3 …2† ‡ 1† states, where d…n† ˆ 1 for n > 0 and 0 otherwise. Table 1 includes the number of states in S, the number of states in the HLM and LLM state spaces, the number of non-zeros in the generator matrix Q excluding diagonal elements denoted as nz…Q† and the number of non-zeros in the isolated matrices denoted as P nz… Qi ). It is obvious that even for this simple example the isolated state spaces together are much smaller than the state space of the complete model. Similarly, the number of non-zero elements in Q is much larger than the number of non-zero elements in the di€erent matrices describing HLM and LLMs. For n1 ˆ n2 ˆ 30, the isolated matrices together include only 28 406 non-zero entries, whereas the complete generator matrix has more than 6 million non-zero entries which is too much to be stored in the main memory of contemporary workstations. In the following section, iterative analysis techniques are introduced which require only the isolated matrices and not the complete generator matrix. Such that the complete generator matrix never needs to be generated and stored for the computation of the stationary solution vector. 4. Iterative analysis of hierarchical models Iterative analysis exploiting the tensor structure of the generator matrix has been proposed for stochastic automata networks in [8,17,20] and in [6,7] for the model class presented here. We brie¯y review the latter techniques, since they are also used in the a/d algorithm presented afterwards. For solution techniques exploiting the matrix structure, the di€erent vectors are decomposed, like matrix Q, into subvectors. We assume in the sequel that Q describes a CTMC with only one recurrent class of states. Let p be the, in our case unique, stationary vector of the CTMC, then p‰nŠ is the subvector of length s…n† including steady state probabilities for states from S‰nŠ. In the same way, iteration vector x can be structured. Vector p is the solution of pQ ˆ 0 and the additional normalization condition peT ˆ 1:0. Knowing p, several performance measures like mean throughputs or populations can be computed straightforwardly (see [20]). Since Q is most times large and sparse, direct solution techniques causing ``Fill In'' cannot be applied to determine p. Iterative techniques preserving the sparseness of Q are more appropriate. However, even these techniques

Table 1 State space dimensions and numbers of non-zero elements n1

n2

jSj

jS0 j

jS1 j

jS2=3 j

nz…Q†

P nz… Qi †

2 10 20 30

2 10 20 30

81 14 641 194 481 923 521

9 120 441 961

25 441 1681 3721

6 66 231 496

288 87 120 1 252 440 6 111 960

178 3266 12 736 28 406

554

P. Buchholz / European Journal of Operational Research 116 (1999) 545±564

reach their limits for larger models, often due to memory limitations. A step towards extending the size of solvable models is to exploit the compact tensor representation of Q. In the context of tensor representations, several iterative solution algorithms have been proposed to determine p. We will present here the Power method, the Jacobi method and a block Gauss±Seidel method. Projection method have also been used successfully, although they require usually more memory since additional vectors have to be stored. Recent results show that even the Gauss±Seidel method, which was previously excluded from the set of usable methods [20], can be applied in the context of structured analysis approaches [4]. We start with the Power method, which is a well known, although not very ecient solution technique. The iteration scheme of the Power method equals xk ˆ xkÿ1 ‡ xkÿ1 Q=a; where a > max…jQ…i; i†j† and x0 P 0, x0 eT ˆ 1:0 is the initial vector. It is known that the Power method converges under the assumptions made here, but convergence can be very slow. Exploiting the matrix structure, we obtain the following equation for a subvector x‰nŠ: X xk ‰nŠ ˆ xkÿ1 ‰nŠ ‡ aÿ1 xkÿ1 ‰mŠQ‰m; nŠ: …5† m2S0

Each submatrix Q‰m; nŠ can be represented as …Il1 A1 Iu1 †…Il2 A2 Iu2 † for n 6ˆ m and as X …Ili Ai Iui † i2I

for n ˆ m, where Ai is some submodel matrix, I is a set of indices and li ; ui are appropriate integers. This representation follows from Eq. (3) and the relation between ordinary and tensor products in Eq. (2). In both cases, a procedure to compute x…Il A Iu † is needed. Such a procedure can be realized without generating the matrix, only x, A, l and u are required as inputs. Here we present a procedure for sparse matrices. The algorithm below computes the product y ˆ x…Il A Iu † without computation and storage of the whole matrix in brackets. The central idea of the algorithm is very simple. A Iu is a matrix, where every non-zero element A…i; j† is substituted by a u  u matrix with A…i; j† in the main diagonal and all remaining elements equal zero. Il …A Iu † generates a matrix with l non-zero diagonal blocks equal to A Iu and all non-diagonal blocks equal 0. In this way, it is easy to compute locations of non-zero elements in the resulting matrix as they are required for vector matrix product computation. The procedure implements the vector matrix product computation in a way that minimizes the accesses to matrix elements in A which is the most expensive operation for sparse matrices. The e€ort of the computation is in O…lu nz…A††, where nz…A† is the number of non-zero elements in matrix A. The computation of x…Il A Id B Iu †; as necessary for the non-diagonal blocks of Q, requires an e€ort in O…l nz …A†…d ‡ rB ‡ u†‡ …l ‡ cA ‡ d† nz…B†u†, where rB is the number of rows of B and cA is the number of columns of A. This di€ers from the e€ort required to compute the vector matrix product by ®rst generating the matrix resulting from the tensor products, which is in O…nz…A† nz…B†lud† according to the number of non-zero elements in the resulting matrix. Exploitation of tensor products becomes more ecient for an increasing number of nonzeros in the matrices. Additionally, we have the advantage of drastically reduced storage requirements.

P. Buchholz / European Journal of Operational Research 116 (1999) 545±564

555

compute_tensor_product(A; x; l; u) /* A is a r  c sparse matrix, l; u are integers, x is a lru-dimensional vector */ int i; j; s; t; nr; nc; vector y /* lcu-dimensional solution vector initialized with 0 */ for i ˆ 0 to r ÿ 1 do for all non-zero elements A…i; j† in row i do nr ˆ i  u; /* compute the index of the ®rst occurrence of row i */ nc ˆ j  u; /* compute the index of the ®rst occurrence of column j */ for s ˆ 1 to l do for t ˆ 1 to u do y…nc† ˆ y…nc† ‡ x…nr†  A…i; j†; nc ˆ nc ‡ 1; nr ˆ nr ‡ 1; od nr ˆ nr ‡ …r ÿ 1†  u; nc ˆ nc ‡ …c ÿ 1†  u; od od od With the above procedure and a procedure to add two vectors, the Power method can be implemented straightforwardly. In a similar way the Jacobi method can be realised. The only di€erence is that diagonal elements have to be handled separately. De®ne a set of diagonal matrices D‰nŠ such that D‰nŠ…i; i† ˆ ÿQ‰n; nŠ…i; i† and let Q0 be equal to Q, where all diagonal elements are set to zero. With these notations the iteration step of the Jacobi method becomes ! X 0 k kÿ1 x ‰mŠQ ‰m; nŠ Dÿ1 ‰nŠ: …6† x ‰nŠ ˆ m2S0

In a similar way we can de®ne a block Gauss±Seidel variant of the method: ! X X 0 0 k kÿ1 k x ‰mŠQ ‰m; nŠ ‡ x ‰mŠQ ‰m; nŠ Dÿ1 ‰nŠ: x ‰nŠ ˆ m2S0 ; m P n

…7†

m2S0 ; m
Both approaches can be realized using the tensor based vector matrix multiplication. Furthermore, overrelaxation can be introduced [20]. We denote the resulting structured variants of the algorithms as SPower, SJacobi, SGS, SJOR and SSOR, respectively. Observe that we have only modi®ed the way to compute vector matrix products and not the iteration methods. Consequently, all results about convergence behavior and all implementation issues, like convergence checking, remain similar. Example 1 (continued). For the iterative analysis of the simple example model we choose the following parameter values: k1 ˆ k2 ˆ 2:0, p2 ˆ 0:5, l21 ˆ l22 ˆ 2, p3 ˆ 0:9 and l31 ˆ l32 ˆ 10. The model is analyzed for di€erent values of n1 and n2 . Table 2 includes the number of iterations, the required CPU time and the elapsed time, both measured in seconds, to reduce the maximum norm of the residual vector to less than 10ÿ10 . SOR describes the standard SOR method, SSOR is the structured block SOR method using Eq. (7) plus additional relaxation. Relaxation parameters are in both cases adjusted dynamically using the heuristics outlined in [20]. The ®rst observation is that SOR yields signi®cantly less iterations than SSOR. Of course, this di€erence is model dependent and depends also on the ordering of states. However, usually SOR requires less iter-

556

P. Buchholz / European Journal of Operational Research 116 (1999) 545±564

Table 2 Number of iterations, CPU time (in s) and real time (in s) to reach krk1 < 10ÿ10 n1

n2

States

SSOR Iter.

2 10 20 30

2 10 20 30

81 14 641 194 481 923 521

SOR CPU

310 4400 13 470 21 910

Real ÿ1

2:67  10 5:03  102 2:21  104 1:71  105

1.00 5:10  102 2:26  104 1:77  105

Iter. 90 940 3760 ±

CPU

Real ÿ2

5:00  10 4:88  101 2:93  103 ±

1.00 4:90  101 3:07  103 ±

ations, because SSOR is a block iteration method, which is in some sense between the iteration of the Jacobi and the Gauss±Seidel method. Additionally, SSOR requires slightly more CPU time per iteration. Also this di€erence depends on the model structure and in particular on the sparseness of the matrices. We have already seen that for relatively dense matrices the tensor based approach is more ecient. However, most examples yield sparse matrices such that the time per iteration is slightly increased when using the tensor based vector matrix multiplication. An obvious advantage of the structured approach is that much larger models can be analyzed. The system with n1 ˆ n2 ˆ 30 can be solved with the structured approach, but there is no chance to solve it on the same hardware using standard means. The structured approach can even be used for larger state spaces, with around 2 millions of states, but in this case the solution time becomes too long and even for the largest model solved here, the solution requires more than 49 hours which is usually too long. So there is a need for new structured solution techniques to reduce the analysis e€ort of large models. This is exactly the motivation for the aggregation/disaggregation approach introduced in the sequel of this paper. 5. Aggregation/disaggregation in hierarchical models In the previous section, iteration techniques have been proposed for hierarchical models. These techniques allow the handling of larger state spaces. However, often a slow convergence of the used iterative method reduces the applicability of the methods. Here we introduce aggregation/disaggregation steps to enhance the convergence of iterative methods. In [6,7], standard a/d methods as known for conventional CTMCs [19,20] have been introduced in the context of hierarchical models. The idea is quite straightforward by aggregation according to the state of the high level model. It has been shown that aggregation/ disaggregation can be implemented very eciently and that the hierarchical model structure often supports aggregation/disaggregation since submodels are loosely coupled. Aggregation according to the HLM state is introduced ®rst. Let x be some estimate of the solution vector which has been computed by performing some iteration steps with an iterative solution technique. We assume that all elements of x are non-zero, which is no restriction since in an irreducible CTMC all stationary state probabilities are non-zero. Let x‰nŠ be the subvector of x including the state probabilities of states from S‰nŠ. An aggregation step according to the HLM state substitutes each subset of states of S‰nŠ ~ 0 and the corresponding by a single aggregated state. We denote the resulting aggregated state space by S ~ generator matrix as Q0 . The aggregated generator matrix is computed element-wise as ~ 0 …n; m† ˆ x‰nŠQ‰n; mŠeT ; Q

where x‰nŠ ˆ x‰nŠ=…x‰nŠeT †:

…8†

The vector±matrix product can be computed eciently exploiting the tensor structure of the submatrices [6,7]. Of course, this aggregation is completely identical to the aggregation of generator matrices with respect to some block structure [12,20].

P. Buchholz / European Journal of Operational Research 116 (1999) 545±564

557

We now propose another idea, which also uses aggregation, but is in some sense orthogonal to aggregation step described above. Of course, both aggregation steps can be and will be combined in the algorithm presented in the following section. The idea is to perform aggregation according to the state of di€erent submodels. Conventional a/d techniques perform an additive decomposition of the matrix and state space, diagonal blocks of the complete matrix are analyzed in an aggregation step. We use a multiplicative decomposition by exploiting the multi-dimensional description of states due to the hierarchical structure. Of course, such an approach requires a structured representation of the generator matrix which includes complete information about the model structure. We ®rst consider the idea of aggregating the environment of a submodel in a hierarchical model. The idea of an aggregation/disaggregation step is to project the current solution on the state space of one LLM, analyze the resulting aggregated system and redirect the complete solution using the new aggregated solution vector. Similar ideas underly most aggregation/disaggregation techniques or projection techniques [5,8,11,14,15,18,19], also the concrete realizations di€er signi®cantly. In a ®rst step, we consider the projection of a vector on the aggregated state space. Let x be a vector on the complete state space S and x‰nŠ the subvector with respect to subset S‰nŠ. Vector x can be an iteration vector, the stationary solution vector or the residual vector r ˆ xQ according to some approximation x of the solution p. We de®ne xj as the projection of x on state space Sj and xj ‰mŠ (m 2 Zj ) as the subvector of xj including values belonging to states from Sj ‰mŠ. The vector is de®ned element-wise as follows: xj ‰mŠ…y† ˆ

X

s…n†ÿ1 X

x‰nŠ…x†

for all 0 6 y < sj …nj †:

…9†

n2S0 ; nj ˆm xˆ0; xj ˆy

Consequently, xj ‰mŠ…y† is the probability that LLM j is in state y when the probability distribution of the complete model is given by x. Alternatively and equivalently, vector xj ‰mŠ can be computed from the following vector±matrix product: X x‰nŠVj ‰nŠ; xj ‰mŠ ˆ n2S0 ; nj ˆm jÿ1 idsi …ni † † Isj …nj † … Jiˆj‡1 idsi …ni † † and idn is the unit vector of length n. For the HLM we where Vj ‰nŠ ˆ … iˆ1 de®ne vector x0 as x0 …y† ˆ x‰nŠeT , where n is the yth state in S0 . Projection of vectors can be applied to the stationary solution vector p yielding pj the vector of marginal …k† probabilities according to LLM j, or to an iteration vector x…k† yielding xj the current solution according …k† …k† …k† to LLM j, or to the residual vector r ˆ x Q yielding rj the residual vector according to LLM j. The vector of residuals is often used as an estimate of the error, although small residuals do not guarantee a …k† small di€erence jp ÿ x…k† j [20]. However, if r…k† is an estimate of the overall error, then rj is an estimate of the error according to LLM j. Thus, the projection of the residuals on the state space of LLM j allows us to estimate the current error according to a speci®c LLM and, of course, also according to performance measures related to this LLM. Large errors according to a LLM indicate that some work has to be done to smooth the error and this is exactly the approach we will apply in the aggregation/disaggregation algorithm proposed in Section 6. It is worth to mention that residuals are only one way to estimate the di€erence jp-x…k† j, other methods based on the di€erence or normalized di€erence of iteration vectors are also applied [20]. Di€erences of iteration vectors can also be projected on LLM or HLM state spaces and can be used as an estimate of the error with respect to a speci®c LLM or the HLM. To perform aggregation of the environment of LLM j, the LLM is moved in the ®rst position to make the following notation easier to understand. This transformation is done using rj and the corresponding permutation matrices Prj ‰nŠ. Computation of the submatrices of the new generator matrix Qrj is described in Eq. (4). In a similar way, the permutation can be applied to vectors x‰nŠ yielding vector

xrj ‰nŠ ˆ x‰nŠPrj ‰nŠ:

…10†

558

P. Buchholz / European Journal of Operational Research 116 (1999) 545±564

After shifting LLM j in the ®rst position, we can continue and describe an aggregation step according to the ®rst LLM. Each vector xrj ‰nŠ can be decomposed into sj …nj † subvectors xrj ‰xj ; nŠ (0 6 xj < sj …nj †) including values belonging to states, where LLM j is locally in state xj and the HLM state is n. Due to the ordering of states after permuting j in the ®rst position, these vectors can be easily constructed because they describe consecutive elements in xrj ‰nŠ: xrj ‰nŠ ˆ …xrj ‰0; nŠ; xrj ‰1; nŠ; . . . ; xrj ‰sj …nj † ÿ 1; nŠ†: To build an aggregated generator matrix according to LLM j, states are aggregated if they belong to the ~ ~ samePlocal state of LLM P j. Let Qj be the generator matrix of the system aggregated according to LLM j. Qj is a nj 2Zj sj …nj †  nj 2Zj sj …nj † matrix, i.e., it is usually much smaller than Q. Aggregation is performed with respect to the normalized vectors xrj ‰xj ; nŠ ˆ xrj ‰xj ; nŠ=xj ‰nj Š…xj † describing the conditional distribution of the states of LLMs i 6ˆ j, when the state of LLM j is xj . All ~ j ‰m; m0 Š. We de®ne submatrices Q‰n; n0 Š with nj ˆ m and n0j ˆ m0 are substituted by a sj …m†  sj …m0 † matrix Q 0 for m; m 2 Zj : P…m; m0 ; j† ˆ fn; n0 2 Z0 j nj ˆ m; n0j ˆ m0 and Q‰n; n0 Š 6ˆ 0g: P…m; m0 ; j† contains HLM state pairs n; n0 where LLM j is in macro state m; m0 and transitions between ~ j ‰m; m0 Š is computed element-wise as states from S‰nŠ and S‰n0 Š are possible. Matrix Q ~ j ‰m; m0 Š…xj ; yj † ˆ Q

X

X …n; m; j†

n;n0 2P…m;m0 ;j†

TX …n;n0 † iˆ1

J

jÿ1

kˆj‡1

kˆ1

ri …n; n0 †Aij ‰n; n0 Š…xj ; yj †xrj ‰xj ; nŠ Aik ‰n; n0 Š Aik ‰n; n0 ŠeT P

…11†

for all xj 2 Sj ‰mŠ, yj 2 Sj ‰m0 Š, where X …n; m; j† ˆ x‰nŠeT =… n0 2Z0 ;n0 ˆm x‰n0 ŠeT †, the conditional probability of j being in HLM state n when LLM j is in macro state nj and state probabilities are given by vector x. The aggregated matrices can be computed straightforwardly by computing vector±matrix products, ~ j is an where the matrix is represented as the tensor product of smaller matrices. It is easy to show that Q irreducible generator matrix if Q is an irreducible generator matrix and all elements of x are non-zero. In ~ j are irreducible generator matrices. Thus, stationary the sequel we assume that all generated matrices Q solution vectors ~j ˆ 0 yj Q

and

yj eT ˆ 1:0

…12†

exist uniquely for all j ˆ 0; . . . ; J . Vector yj can be interpreted as the stationary solution of a system, where all LLMs except j are aggregated according to a ®xed distribution vector. Usually, yj will be an improved solution compared with xj since it results from a detailed analysis of the aggregated system. Thus, yj can be used to redirect x the current estimate of the overall solution vector. This is done in a multiplicative way, as usual in multi-level methods [5,11,14,15,18], yielding x0 ‰nŠ…x† ˆ x‰nŠ…x†yj ‰nj Š…xj †=xj ‰nj Š…xj †

…13†

for all n 2 S0 , x 2 S‰nŠ and j ˆ 1; . . . ; J . If aggregation has been performed with respect to the HLM, the vector is redirected as follows: x0 ‰nŠ ˆ x‰nŠy0 …x†=…x‰nŠeT †;

…14†

where n is the xth state in S0 . With the ingredients presented in this section it is natural to de®ne an iterative aggregation/disaggregation algorithm as done in Section 6.

P. Buchholz / European Journal of Operational Research 116 (1999) 545±564

559

Example 1 (continued). The aggregation, which is performed at matrix level, can also be interpreted in terms of the model. The aggregated HLM describes a model where every LLM is substituted by a queue with exponential service time distribution and class dependent service rate depending on the whole state of the HLM. An aggregated model for a LLM describes the LLM in combination with a source generating incoming customers with an arrival rate depending on the detailed LLM state. However, these models need not be speci®ed, they are only used as an interpretation helping to understand the aggregation. Aggregation is performed at matrix level as described. 6. An adaptive aggregation/disaggregation algorithm The basic idea of the proposed a/d algorithm is to combine iteration steps and aggregation/disaggregation according to the HLM and the LLMs. InP the algorithm all vectors without subscript are of length jSj, vectors with subscript j are of length jSj j ˆ m2Zj sj …m†. Matrix Q is jSj  jSj matrix, which is only ~ j is a jSj j  jSj j matrix which is generated in an agimplicitly available via the tensor representation. Q ~ j cannot be larger than gregation step and stored as a sparse matrix. The number of non-zero elements in Q the number of non-zero elements in the matrices describing LLM j and is usually much smaller than the number of states in S. In the a/d algorithm we have to choose an appropriate initial vector, like for any iterative technique. Furthermore, the number of iteration steps m between two aggregation/disaggregation steps has to be chosen. We have good experience by using m ˆ 10 during the ®rst 100 iterations and increasing the value afterwards with an increasing number of iterations. But m may as well be determined dynamically during the iteration by observing the convergence behavior. Furthermore two error constants 1 and 2 are required. 1 is used as the stopping criterion of the approach as usual in iterative techniques. The whole approach stops if the norm of the residuals becomes smaller than 1 . 2 is used to decide whether an aggregation step is performed. If the norm of the residual vector projected on state space Sj is smaller than 2 , it is assumed that the error according to this part of the model is small enough and no aggregation step is required. If the norm exceeds 2 , an aggregation step is performed to smooth the local error. The whole algorithm consists of the following steps: 1. Choose an initial vector x…0† P 0 with x…0† eT ˆ 1:0, x…0† …s† P 0, choose m > 0, 1  1 ; 2 > 0 and set k ˆ 0. 2. Perform m steps with one of the iterative solution methods Eqs. (5), (6) or (7) starting with x…k† to compute x…k‡m† and set k ˆ k ‡ m. 3. Compute the residual vector r…k† ˆ x…k† Q. 4. If kr…k† k < 1 then STOP else set j ˆ 0 (go on with a/d steps). …k† 5. If krj k > 2 then goto 6 (perform an a/d step for j) else goto 9 (no a/d step for j). ~ j and xj from x…k† via Eqs. (9) and (11) for j > 0 or via Eq. (8) for j ˆ 0. 6. Compute Q 7. Compute yj via Eq. (12). 8. Redirect x…k† via Eq. (13) for j > 0 or Eq. (14) for j ˆ 0. 9. Set j ˆ j ‡ 1 if j 6 J then goto 5 else goto 2. The idea of the algorithm is easy to explain. Some iteration steps are performed, afterwards the residuals are computed. If residuals are small enough (i.e., smaller than error tolerance 1 ), the iteration is stopped. Otherwise local residuals are computed for the HLM and all LLMs. If local residuals are too large (i.e., larger than error tolerance 2 ), an a/d-step is performed according to this LLM/HLM. In this way, a/d-steps are performed adaptively only for those parts, where the error is estimated to be large. Aggregated systems are usually solved using a direct method, since they are most times small compared with the size of the overall state space. However, if aggregated systems are too large, they can as well be analyzed with an iterative technique. In this case, it is sucient to perform only a few iteration steps and not to solve the aggregated system exactly. This observation has also been made in multigrid like techniques [5,14].

560

P. Buchholz / European Journal of Operational Research 116 (1999) 545±564

Although the approach looks similar to known a/d-techniques for CTMCs, there are several points which di€er signi®cantly. · The tensor structure of the generator matrix is exploited allowing the analysis of much larger models. · Aggregated systems are built according to the tensor structure yielding a completely di€erent decomposition than usually used in a/d-algorithms, where blocks of the generator matrix are aggregated. In particular, the decomposition describes a multiplicative decomposition of the state space instead of an additive decomposition used in standard approaches. This has the e€ect that aggregated systems are much smaller and the whole approach is more ecient. The aggregation is also di€erent from aggregation at model level, since aggregated transition rates depend on the local state of the HLM/LLM which is analyzed in an a/d step. · The approach is adaptive by allowing to aggregate only those parts, where errors are large. Example 1 (continued). We now use the a/d algorithm for the solution of the simple example model. Results are presented in Table 3. The columns for SSOR include the values for the structured SOR method which have already been shown in Table 2. Columns SSOR a/d show the results of the structured SOR method enhanced with a/d steps. It is obvious that the introduction of a/d steps reduces the number of iterations and the CPU time for the solution drastically. Especially for large state spaces the bene®ts of introducing a/ d steps are signi®cant. For the largest con®guration we considered, the number of iterations is reduced by a factor of 30 due to the introduction of a/d steps. Since the e€ort of a/d steps is relatively small compared to the e€ort of the iteration, time requirements are reduced similarly to the number of iterations. SSOR a/d is except for the ®rst very small con®guration slightly faster than the standard SOR and allows us to solve as large models as SSOR. The largest con®guration with nearly a million states can be analyzed in about 1.5 hours with a high accuracy. 7. Examples We present two additional examples. First, a model of a multiserver random polling system, which has originally been published in [2] as a GSPN model and is used in several other papers as an example (e.g., [6,16]). The second example describes a generalization of the classical machine-repairman model. 7.1. A multi-server multi-queue model The model describes a number of queues, in our example 5 queues, which are visited by three servers in cyclic order (see Fig. 2). Queues are numbered consecutively 1±5. Each queue has a ®nite capacity Ki , requests arrive to queue i according to a Poisson process with rate ki . Requests arriving to a full queue get

Table 3 Number of iterations, CPU time (in s) and real time (in s) to reach krk1 < 10ÿ10 n1 2 10 20 30

n2 2 10 20 30

States 81 14 641 194 481 923 521

SSOR

SSOR a/d

Iter.

CPU

Real

Iter.

CPU

Real

310 4400 13 470 21 910

2:67  10ÿ1 5:03  102 2:21  104 1:71  105

1.00 5:10  102 2:26  104 1:77  105

50 280 1470 700

2:00  10ÿ1 4:41  101 2:41  103 5:76  103

1.00 4.60101 2.46103 5.81103

P. Buchholz / European Journal of Operational Research 116 (1999) 545±564

561

lost. If a server arrives at a queue with waiting requests, it serves one request and attends subsequently the following queue. If no request is waiting, the server travels immediately to the subsequent queue. All three servers can simultaneously serve requests at the same queues. We assume that service times are exponentially distributed with mean lÿ1 and the time a server needs to travel from one queue to the next is also exponentially distributed with mean xÿ1 . Models of the described type are important for the analysis of computer and communication networks. Usually only systems with one server are analyzed. Multiple servers increase the throughput and reduce the waiting time. This has already been pointed out in [2], where such systems have been analyzed for the ®rst time using numerical methods. Here we use the hierarchical approach for the analysis of the model, which allows us to analyze much larger con®gurations. In a hierarchical version of the model each queue with arrival and service mechanism and the travelling of a server to the subsequent queue forms a LLM. In the HLM all LLMs are connected cyclically. Servers form the customers of the HLM which belong all to a single class. A server leaving LLM i enters immediately the subsequent LLM i ‡ 1 or 1 for i ˆ 5. Additionally, we choose for analysis the following parameter values: l ˆ 1:0, x ˆ 10:0, k1 ˆ 0:1, ki ˆ 0:4 (i ˆ 2; . . . ; 5) and Ki ˆ 5 for all i. The resulting CTMC contains 1 308 600 states, but only 1200 non-zero elements are necessary to represent the whole generator matrix via tensor products/sums of LLM-matrices. The original generator matrix Q contains 9 947 784 non-zero elements. Obviously the state space and the number of non-zero elements in Q are too large to be handled with conventional methods. The tensor based approach allows the analysis of this model on standard workstation with 32MB main memory without needing additional virtual memory. Table 4 includes the number of iterations and the CPU-times to reduce the norm of the residual vector below some prede®ned error tolerance . First, one should notice that this large CTMC can be analyzed in less than an hour using structured analysis techniques. The presented CPU times are roughly identical to the real time, if the process is running exclusively on its workstation and no paging is necessary. CPU times do not include the time to generate the state space and the matrices. However, in the structured approach only small state spaces and matrices need to be generated. The example state space generation takes less than 1 second with the QPN-tool [3], whereas state space generation in the conventional analysis can be very time consuming. The integration of a/d steps reduces the number of iterations signi®cantly. The difference is relatively larger for a low accuracy, since the ®rst a/d steps compute a fast approximation of the overall solution vector. However, even for  ˆ 10ÿ12 a/d steps reduce the number of iterations and the solution time to less than half of the value required without a/d steps. The additional e€ort to perform the a/ d-steps is very small, which can be seen by comparing the CPU time per iteration for SSOR with and without a/d steps. Both times are roughly identical. It is worth to mention that the example is not particularly well structured for a/d methods. LLMs are not loosely coupled and the sojourn time of a server in

Fig. 2. Multi-server multi-queue model with three servers and ®ve queues.

562

P. Buchholz / European Journal of Operational Research 116 (1999) 545±564

Table 4 Number of iterations and CPU time to reach krk1 <  

SSOR

SSOR + a/d

Iterations ÿ4

10 10ÿ6 10ÿ8 10ÿ10 10ÿ12

CPU-time 3

130 230 330 430 540

1:03  10 1:81  103 2:60  103 3:46  103 4:26  103

Iterations

CPU-time

20 70 120 170 230

1:98  102 6:73  102 1:13  103 1:52  103 1:98  103

a LLM has a high variance since x is much larger than l. Both points usually reduce the gain of a/d steps. Nevertheless, a/d-steps still improve the solution. 7.2. A generalized machine repairman model The second example describes a generalization of the machine-repairman model. The HLM contains three LLMs and is structurally identical to the HLM for the running example shown in Fig. 1. The internal structure of the LLMs is di€erent from the simple example. LLMs 2 and 3 each describe a production cell with Mi homogeneous machines and an additional bu€er with capacity Ki . Parts arrive according to a Poisson process with rate ki into the bu€er. Processing time is exponentially distributed with rate li for machines in LLM i. Arrival and processing of parts are completely internal to the LLMs. Machines in the LLMs may fail and require repair. Times to failure are exponentially distributed with mean mÿ1 i . A failed machine is immediately moved to the repair facility which is located in the ®rst LLM. The repair facility includes two repairmen. For the repair of a failed machine from LLM 2 a single repairman requires an exponentially distributed time with mean …10m1 †ÿ1 . For the repair of a machine from LLM 3 both repairmen have to work simultaneously and require an exponentially distributed repair-time with mean ÿ1 …10m2 † . A repaired machine is immediately returned to its LLM and starts processing, if unprocessed parts are waiting. Machines from LLM 2 are repaired with non-preemptive priority. As long as failed machines from LLM 2 are waiting they are repaired. Only if no more machines from LLM 2 require repair and both repairmen are available, they start repairing machines from LLM 3. As an example we use a relatively small con®guration of the system to compare SSOR with and without a/d steps with conventional SOR. We choose M1 ˆ M2 ˆ 3 and K1 ˆ K2 ˆ 10, k1 ˆ k2 ˆ 0:5, l1 ˆ l2 ˆ 1:0 and m1 ˆ m2 ˆ m, yielding a CTMC with 3025 states. Table 5 contains the number of iterations necessary to reduce the residual norm below 10ÿ10 for di€erent failure/repair rates. Conventional SOR behaves better

Table 5 Number of iterations and CPU time to reach krk1 < 10ÿ10 m

Iterations

10ÿ6 10ÿ5 10ÿ4 10ÿ3 10ÿ2 a

CPU time in seconds

SSOR + a/d

SSOR

SOR

SSOR + a/d

SSOR

SOR

80 120 160 180 130

a

a

a

a

a

a

a

a

a

a

6200 790

1910 260

2.00 3.07 4.00 4.30 3.35

After 10 000 iterations the required accuracy has not been reached and the iteration is stopped.

a

a

1:11  102 1:39  101

1:84  101 3.51

P. Buchholz / European Journal of Operational Research 116 (1999) 545±564

563

than SSOR, which uses entries of the momentary iteration vector only if they belong to a block with a smaller index and not if the element has a smaller index. However, with additional a/d-steps SSOR clearly outperforms SOR, even for large failure/repair rates, where the LLMs are not loosely coupled. For small and realistic failure/repair rates, SSOR a/d is the only applicable method. 8. Conclusions The paper introduces a new approach for the analysis of large CTMCs resulting from hierarchical queueing networks or GSPNs. The proposed technique combines ideas from structured analysis and aggregation/disaggregation algorithms. The new algorithm di€ers from other known aggregation/disaggregation algorithms (see [19,20]) in the following points. · The technique allows one to exploit the tensor structure of the generator matrix and can therefore be used to analyze very large CTMCs, much larger than analyzable with conventional means. · Aggregation is adaptive by aggregating only those parts where the error is estimated to be high. · Convergence is improved even for models where submodels are not loosely coupled, although advantages of a/d-steps increase with a decreasing coupling of submodels. Conventional a/d-methods require a block-structure of the generator matrix with loosely coupled blocks. · Aggregated systems to be analyzed in an a/d-step are most times smaller than in conventional a/d-methods. The reason is that the decomposition is multiplicative, i.e., a detailed submodel with an aggregated environment is analyzed. In conventional a/d-methods decomposition is additive, i.e., all submodels in a ®xed macro state are analyzed in an a/d-step. The a/d-technique presented in this paper has been integrated in the QPN-tool [3] and is generally usable for hierarchically combined queueing/Petri nets. A model class which is more general than the one presented in this paper. The hierarchy implicitly de®nes the structure of the model in LLMs and HLM. This integration shows that the solution technique is a generally applicable approach for the analysis of large QN-/SPN-models. References [1] M. Ajmone-Marsan, G. Balbo, G. Conte, G. Chiola, Generalized stochastic Petri nets revisited: Random switches and priorities, in: Proceedings of the Second International Workshop on Petri Nets and Performance Models, IEEE Press, New York, 1987, pp. 44±53. [2] M. Ajmone-Marsan, S. Donatelli, F. Neri, GSPN models of Markovian multiserver multiqueue systems, Performance Evaluation 11 (1990) 227±240. [3] F. Bause, P. Buchholz, P. Kemper, QPN-tool for the speci®cation and analysis of hierarchically combined queueing Petri nets, in: H. Beilner, F. Bause (Eds.), Quantitative Evaluation of Computing and Communication Systems, Lecture Notes in Computer Science, vol. 977, Springer, Berlin, 1995, pp. 224±238. [4] P. Buchholz, G. Ciardo, S. Donatelli, P. Kemper, Computational aspects of Markov chain solution algorithms based on Kronecker algebra, NASA ICASE Report No. 97-66, December 1997, submitted. [5] A. Brandt, Multi-level adaptive solutions to boundary value problems, Mathematics of Computation 31 (1977) 333±390. [6] P. Buchholz, A hierarchical view of GCSPNs and its impact on qualitative and quantitative analysis, Journal of Parallel and Distributed Computing 15 (1992) 207±224. [7] P. Buchholz, A class of hierarchical queueing networks and their analysis, Queueing Systems 15 (1994) 59±80. [8] P. Buchholz, An aggregation disaggregation algorithm for stochastic automata networks, Probability in the Engineering and Information Sciences 11 (1997) 229±254. [9] P. Buchholz, Lumpability and nearly-lumpability in hierarchical queueing networks, in: Proceedings of the First IEEE International Computer Performance and Dependability Symposium, IEEE Computer Soc. Press, Silver Spring, MD, 1995, pp. 82±91. [10] G. Chiola, G. Bruno, T. Demaria, Introducing a color formalism into generalized stochastic Petri nets, in: Proceedings of the Ninth European Workshop on Application and Theory of Petri Nets, 1988, pp. 202±215.

564

P. Buchholz / European Journal of Operational Research 116 (1999) 545±564

[11] F. Chatelin, W.L. Miranker, Acceleration by aggregation successive approximation methods, Linear Algebra and its Applications 43 (1982) 17±47. [12] P.J. Courtois, Decomposability Queueing and Computer System Applications, Academic Press, New York, 1977. [13] M. Davio, Kronecker products and shu‚e algebra, IEEE Transactions on Computers 30 (1981) 116±125. [14] G. Horton, S.T. Leutenegger, A multi-level solution algorithm for steady-state Markov chains, Performance Evaluation Review 22 (1994) 191±200. [15] G. Horton, S. Leutenegger, On the utility of the multi-level algorithm for the solution of nearly completely decomposable Markov chains, in: W.J. Stewart (Ed.), Computation with Markov Chains, Kluwer Academic Press, Dordrecht, 1995, pp. 425±442. [16] Y. Li, C.M. Woodside, Complete decomposition of stochastic Petri nets representing generalized service networks, IEEE Transactions on Computers 44 (1995) 577±592. [17] B. Plateau, On the stochastic structure of parallelism and synchronization models for distributed algorithms, Performance Evaluation Review 13 (1985) 142±154. [18] J. Ruge, K. St uben, Algebraic multigrid, in: S. McCormick (Ed.), Multigrid Methods, SIAM, Philadelphia, PA, 1987. [19] P.J. Schweitzer, A survey of aggregation/disaggregation methods in large Markov chains, in: W.J. Stewart (Ed.), Numerical Solution of Markov Chains, Marcel Dekker, New York, 1991, pp. 63±88. [20] W.J. Stewart, Introduction to the Numerical Solution of Markov Chains, Princeton University Press, Princeton, NJ, 1994.