Allocation of Periodic Hard Real-Time Tasks

Allocation of Periodic Hard Real-Time Tasks

Copyright © IFAC Real Time Programming. Fort Lauderdale. Florida, USA. 1995 ALLOCATION OF PERIODIC HARD REAL-TIME TASKS P. Altenbernd· and C. Ditze t...

2MB Sizes 0 Downloads 24 Views

Copyright © IFAC Real Time Programming. Fort Lauderdale. Florida, USA. 1995

ALLOCATION OF PERIODIC HARD REAL-TIME TASKS P. Altenbernd· and C. Ditze t • Cadlab, 33094 Paderborn, Germany t Univer6ity of Paderbom, 33095 Paderborn, Germany

Abstract. In this paper, a static allocation (mapping) algorithm is described which handles periodic, hard real-time controller tasks. Our hybrid approach combines the advantages of constructive, and non-guided-search methods to overcome the problems of purely constructive, or purely non-guided-search techniques. The new method is motivated by the configuration of a parallel real-time operating system to manage very complex control applications. The goal of the configuration is to reduce OS overhead as much as possible. It is achieved by allocating the controller tasks to the processor network in advance. Following this approach the advantages of special purpose processors (no overhead) and OS driven computers (good utilisation) are combined.

Keywords. Parallel Real-Time Operating System, Preconfiguration, Allocation (Mapping) .

1

be either schedule driven (SD) , or OS driven (OD). The controller tasks may be mapped to both, SD and OD processors. SD processors are supervised by a runtime system using a batch scheduler to run fixed schedules, so that the overhead for SD processors is almost zero. OD processors are ordinary OSdriven resources which use non-occupied capacities to serve other purposes (e.g. sporadic tasks, no-realtime tasks). The OS overhead is eliminated as far as possible by filling up the SD processors.

INTRODUCTION

Today's complex mechatronic real-time systems, like cars, contain many tens of processors to manage system control, and the number is still increasing. Digital controllers often run on isolated special purpose processors which do not use an operating-system (OS) to avoid overhead (particularly for fine-grained applications) . These processors are often not highly utilised. However, available slack cannot be accessed, so that many resources are wasted. In contrast to this, using an OS-driven parallel computer provides better utilisation of available resources.

The advantage of this approach is illustrated by the following example of three controller tasks A (SDload = 50% , OD-load = 60% including overhead), B (50%, 60%), and C (15%, 20%). Supposed that the tasks run on two special purpose processors, the total load is 115% and the remaining 85% slack cannot be accessed. If they run on an OS-driven computer , also two processors are needed, this time having a total load of 140% with the remaining 60% slack still available for other purposes. Declaring one of the processors to be " SD", this processor could handle A and B, so that the remaining 80% of the other OD processor can be accessed for other purposes.

Motivated by managing very complex control applications by a parallel computer, this paper tries to combine the advantages of special purpose processors (no overhead) and OS driven computers (good utilisation) by configuring the real-time OS in advance . For this purpose the OS must be able to adapt functionality and overhead based on present timing constraints. Hence a parallel real-time OS has to fulfill different demands: Besides traditional services it must deal with real-time constraints (ranging from no-real-time to hard real-time), and with different granularities.

The rest of the paper deals with the automatic allocation of controller tasks to a point-to-point network of identical processors. Although our goal is minimising OS overhead following the approach described above , the new method is applicable to other allocation techniques, as well. The new method is presented with regard to grids, but is not limited to other architectures. We consider communicating,

The goal of the configuration is to reduce OS overhead as much as possible. It is achieved by allocating (mapping) periodic controller tasks with known computation times, and hard deadlines to the processor network in advance. Each processor is assigned to 197

Allocation is an NP-hard problem which makes it impossible for large applications to use an optimal algorithm (e.g. proposed by Peng et al. (1989) [9]) . Purely non-guided search techniques, like simulated annealing (e.g. used by Tindell et a.l. (1992) [12]), sometimes fail in finding a solution within large applications. Simulated annealing has especia.lly difficulties with the quite "strange shape" of our a.llocation problem, with lots of very high peaks and a number of troughs hiding behind them. It has problems settling into one of the troughs, so that it often cannot find a result. To overcome this problem especia.lly within large systems, we use a mixture of constructive and non-guided-search heuristics.

periodic tasks, with known computation times, and hard deadlines. For scheduling the dea.dline monotonic approach (DMS), e.g. introduced by Audsley et a.l. (1991) [2] is used for both, SD and OD processors. Due to large problem sizes the a.llocation is done by a heuristic, consisting of two steps. The first step is constructive: It tries to find any feasible solution considering only loca.l costs. In the second step the initia.l solution is stepwise improved until no further improvement is possible (hill climbing) . By combining the advantages of constructive (building feasible solutions) and non-guided-search (modifying the current solution in order to get a better one, e.g. simulated annealing) techniques, our new method overcomes the problems of existing approaches. Constructive approaches often suffer either from very long runtimes, or from poor results. Iterative approaches sometimes fail in finding any solution. Both types of methods are discussed in more details in Section 2. The first step of our heuristic is constructive, while the second step represents nonguided search. The experimenta.l results (see Section 4) show that for large applications our hybrid heuristic outperforms purely constructive, and purely nonguided-search techniques.

Purely constructive approaches, like the work of Xu (1993) [13], or of Ramamritham (1990) [10], often handle scheduling and a.llocation simultaneously, to determine feasible (i.e. a.ll deadlines are met) static schedules with minimal length. These methods suffer either from very long computation times in multiperiod systems, or from poor results with regard to a.llocation since the schedule length is their main concern. Their run-time grows with the size of the least common multiple (LCM) of the period lengths of the controller tasks which can be very long. Hence only the first step of our DMS-based method is constructive. In this step only tasks with the same period lengths are considered. In the second step, when multiple periods are possible, a schedulability test introduced by Altenbernd (1995) [1] is used for both, SD and OD processors, which avoids exploring the LCM. Furthermore, our goal is not to minimise the schedule length, but to minimise occupied resources with guaranteed hard real-time conditions.

The rest of the paper is organised as follows. After describing the related work in Section 2 with regard to operating systems, and optimising methods, we introduce our new technique in Section 3. In Section 4 some experimenta.l results are discussed. Mainly, the results of our technique and simulated annealing, as an example of a non-guided-search approach , are compared to each other. Fina.lly, Section 5 gives some concluding remarks.

2

RELATED WORK 3

Cordsen et a.l. (1992) [3] introduced an OS , that adapts its set of services to application demands. It has been shown that the configuration of services can greatly reduce system overhead . Unfortunately, only non-rea.l-time services are considered.

SOLUTION

The cost function to be minimised is a function of used processor and communication resources. For each SD processor the costs are 100%, for each OD processor the costs are the sum over the task loads hosted there. For the purpose of simplification the communication overhead is skipped in the following . However, it can easily be included into the algorithm as well.

The distributed rea.l-time OS " CHAOS" introduced by Gheith et a.l. (1993) [6] provides adaptation to rea.l-time constraints that may change at runtime by selecting appropriate control a.lgorithms from a userdefined set . This is different from our understanding of configuration, and does not reduce OS overhead.

The first step of the algorithm named the construction phase tries to find a feasible solution. To avoid exploring the LCM , each transaction (= set of tasks with the same period) is treated separately, loca.lly optimising the costs . At the beginning a potentia.lly unlimited number of SD processors is assumed . At the end of the construction phase a.ll transaction based solutions are merged onto one solution serving as the input for the second step. In the second step named the improving phase the solution is further improved based on graph bisection .

Most the work in the field of a.llocating tasks in multiprocessor systems is focused on load ba.lancing in order to keep the communication low , e.g. recent work of Diekmann et al. (1994) [4] . However , for the problem addressed by this paper it is not the goal to balance load, but to minimise overhead , which can only be done by filling up the SD processors. Furthermore, load balancing algorithms do not care about real-time conditions.

198

3.1

Basic Assumptions

3.2

The Construction Phase

The construction phase has two goals: First, the construction of any feasible solution, and second to use very little resources. During this phase an allocation where all deadlines a.re gua.ranteed to be met is constructed , while using as few SD processors as possible. The resulting solution will have quite a good corresponding cost value, to be refined by the improving phase.

For the task model the following assumptions a.re made: • Each task re-a.rrives periodically. The period length is denoted as T . • For each a.rrival, a task executes a bounded amount of computation denoted as e. The load of a task is efT * 100.

The construction phase is divided into three further substeps named Graph Reduction, Transaction Mapping, and Subnet Joining. In the first substep each transaction is treated separately, and the graph complexity is reduced using the critical path method (CPM) , e.g. introduced by Hitchcock (1982) [7] . The result is a reduced GP-graph (a graph ofless size) serving as the input for the next substep, the transaction mapping. During this phase each reduced CPsubgraph is mapped onto a processor subnet . In the last step of the construction phase the processor subnets are joined to one solution.

• Each task must meet a ha.rd real-time condition , termed the deadline less or equal to the period length. • Within each period a task may receive data from one or more other tasks at its beginning, and it may send data to one or more other tasks at its end. • Communicating tasks share the same period length , and must be hosted either on the same processor, or on neighboured processors in the processor network. By this assumption no messages routing is necessary, so that the communication overhead is (almost) equal to the sending delay. In most cases this won 't be too restrictive since real applications seldom need "full" parallelity.

---Constr\Jction-Phase---l I

• Scheduling is done locally for each processor using the priority driven deadline monotonic approach (the smaller the deadline the higher the priority). The communication imposed precedence constraints are mapped to the deadlines to assign appropriate priorities (see [1] for more details).

I

iI ____________________ : : §,mm1"1'li::I~:

J

• The tasks will be located on a point-to-point network of identical processors. In the following our approach is presented with regard to grids , but is not limited to other architectures.

Figure 2: Overview of the algorithm In Fig. 2 the whole algorithm is described by a diagram .

~6

Graph Reduction

4

In this substep CPM is used to transform the graph to a so called GP-graph (see below) , a graph of less size which can easier be mapped to the processor network without violating the neighbourship condition . Furthermore, the CP-graph is then fur t her reduced again with the goal of making mapping easier. Generally spoken , reduction is nothing else than moving a task to the node that contains a task to which it communicates .

Figure 1: An example graph The tasks compose a directed . acyclic graph G = (V, E) where V is the set of tasks , and E is the set of edges representing communications between tasks, i.e . if task a sends to task b: a - bEE. An example of a graph representing one transaction is shown in Fig . 1. The labels at the vertices represent the computation times of the task.

Definition:

A path P is a sequence of vertices (Pl .P2, ···,Pm ) with : for all 1 :S i < m : (p i, p,+d is

199

an edge of the graph, and PI is a source node and p ... is a sink node. Vp denotes the vertices of P, and Ep denotes the edges of P.

Definition: The static offset value of task p,

Op,

is

defined as:

Op

:= {

if ;8q : q -- p E E E}, else

0, M AX {Oq

+ C q 1 q -- p E

Definition: The static deadline of task p, dp , is defined as: d ._ { T p ,

MIN{d q

p .-

-

Cq

if ;8q : p - q E E Ip -- q E E},eise

Figure 3: Critical paths

o

S=8,C=lO

Note, that the static deadline may be distinct from the deadline of the basic assumptions (Subsection 3.1). The static deadline represents structural information only, whereas the normal deadline also takes into account to concrete processor mapping (described in [1]) and is used during the improving phase for the final solution. Nevertheless, the static deadline will be used, as well, as a point that must not be exceeded by a task during the construction phase, due to the lack of a concrete processor mapping.

14,4

@

Figure 4: Example of a CP-graph

Definition: Let p be task. The slack of p is defined as: Sp

:=

dp

-

Op -

In the following it is tried to further reduce the size of the CP-graph by joining some of the paths (supervertices) . For this purpose the slack-value of a vertex is used again.

Cp

Definition: Let P =(PI,P2"",Pm) be a path. The slack of P is defined as: Sp := M I N {Spi

11 :5 i :5

Definition: Let P be a vertex of the CP-graph. The computation time Gp is the sum over the computation times of all tasks represented by the vertex.

m}

Definition: The critical path in a graph is the path

The next algorithm tries to further reduce the size of the CP-graph. First, and most important, the mapping towards the processor network is of interest, i.e. the neighbourship conditions are addressed by minimising the number of edges of the CP-graph. And second, the overall cost function has to be considered , i.e. minimising the number of processors. Hence the next algorithm tries to minimise the following function (the size value): hIEcpl+lVcpl, where the "2" expresses the greater importance of the neighbours hip conditions.

with the smallest slack. The following algorithm stepwise determines the vertices of the current critical path in the current graph . The current graph is reduced by these vertices in each iteration step.

Algorithm: Let GPi be the critical path in Gi = (V.,E;) . Set Vo := V, Eo := E , V.+I := V. - Vcp" and E,+I := Ei - Ecp, for 0 :5 i :5 n , so that Vn #- 0 and Vn +1 = 0.

Algorithm: Two vertices (paths) of the current CPgraph , and of the same transaction, G Pi, GP] with i < j are joined whenever the slack of G Pi greater or equal the computation time of GP] (Scp, 2 GCPj)' The resulting vertex (path) is G P" For joining two paths two virtual edges are included into the graph. They express the behaviour concerning DMS . Let u , - v, E VcP,. For each u] E VcP) with du, :5 d u) :5 dv, the edges Ui -- u] ' and u] - v, are included into the graph . The resulting path G Pi is: ( ... , Ui , U],Vi , .. . ). After each union of two paths the static offset values and static deadlines are recomputed to update available slack taking into account the new edges. \Vithin each iteration step of the algorithm the combination of G Pi , and C p] with

The critical paths GPo, ... , G P4 of the example of Fig. 1 are shown in Fig . 3.

Definition: The GP-graph is the undirected graph Gcp Ecp

=

(Vcp, Ecp) with Vcp = GPo, ... ,GPn , and 1 u -- v E E, u E CP" v E GP] , i #- j}.

= {tu, v)

In other words, for the construction of the CP-graph the vertices of the current critical path in the original graph are stepwise transformed to a super-vertex in the CP-graph . The edges of the CP-graph represent the edges of the original graph with regard to the super-vertices. The resulting CP-graph of the example is shown in Fig. 4.

200

the lowest costs is selected next. The algorithm runs until no further reduction is possible. The resulting graph is the reduced GP-graph. 0,18

S=8, C=10

0, 18

@ 14, 4

5=4, C=14

.~

@

@ (a)

(b)

Figure 5: Reducing the CP-graph

A

B

Figure 7: A CP-graph embedded into a processor

...---,--- --~-------~-"\ 12 ~18J

B

sub net

-----------------------~

CP2

",,/"

"

,," ,'-5=-4---.... , ...... 0=4, d= 12

"

",

/

/

,,/

(..4,

'"

,

expressed by a CP-graph consisting of 4 vertices and 3 edges . Hence it is possible the find a valid mapping (see Fig . 7, the background patterns represent different processors) by a simple backtracking algorithm.

" , ,

The trade-off of-course is that the algorithm sometimes finds no solution , although there is one, due to the fact the graph reduction leeds to loss of information. However, the experimental results in Section 4 show that in all of the considered large applications a solution was found, and that the failing rate is quite low .

,18;

------ ----------------- -----Figure 6: The joined paths

The CP-graph of the example is reduced twice as shown by Fig . 5 (a) and (b) . Starting with a size value of 9, the vertices GP1 and GP3 are joined first (size value 6). Secondly, vertices G P 2 and G P 4 are joined resulting in a size value of 5. More detailed the union of G PI and G P3 is shown in Fig. 6, as well as the union of G P2 and G P 4 (the virtual edges are represented by dashed arrows) .

Subnet Joining

The preceding substep ends with a number of separate , quadratic subnets each of factor-of-two size (the example of Fig. 7 is of size 16, although it only occupies 6 processors) . During this substep all subnets are joined to one subnet with the goal of covering as few processors as possible. This is the only part of our algorithm which is tailored for grids . However , the concept of this substep can be used for other architectures as well.

Transaction Mapping

During this substep each reduced CP-subgraph is mapped onto a separate processor subnet . Note. that a CP-subgraph represents at most one transaction .

Algorithm The subnets are sorted by their size with the largest one first : clo , ... ,clk-l. See Fig. 8{a) for example. The algorithm first tries to fill the vacancies of the large subnets with small subnets (of quadratic size). That means for a considered subnet cl;, that it is stepwise tried to join it with smaller subnets cl.+I, ... ,clk-l . This is don e until no further suitable vacancy is found , e.g. Fig . 8(b). Finally, the algorithm joins all remained subnets to one by simply pushing them against each other e.g. Fig. 8(c) .

The main problem of this substep is that communicating tasks (paths) must either be located on the same processor , or on neighbour processors. Trying to overcome this problem on the basis of tasks would suffer from very long computation times. It would possibly mean to consider all combination of allocating tasks to processors , without any guaranty of a valid solution. However , using the reduced CPgraph instead , reduces the problem size dramatically: graphs with about 50 tasks and about 70 edges can be

With the end of this substep a feasible solution found,

201

IS

P1(_IIl)

"(_SO)

(a) SD

procaoor

(b )

P3:(cooto:33)

(c)

Figure 9: Improving phase

PI is taken next.

Figure 8: Subnet joining 4 3.3

EXPERIMENTAL EVALUATION

The Improving Phase In this section some experimental results are shown, to present the runtime behaviour of our method, and to compare our results with other techniques including both purely constructive and purely non-guidedsearch approaches.

The improving phase represents non-guided search (hill climbing) . During this phase the solution of the construction phase is further optimised based on graph bisection. Graph bisection is commonly used in the field of non-real-time load balancing, and was firstly introduced by Kernighan et al. in 1970 [8] . It means that a graph (representing the tasks) is divided into two parts, each of them representing one processor. The number of edges between the two subgraphs is called cut, and is usually used as the objective function to minimise while keeping the load balanced. However, relaxing the restriction of only two processors actually means that a task can be moved to a neighbour processor under the assumption keeping the neighbourship conditions of communicating tasks intact. This is also used for our technique.

id tye bye mye hyc

iVl

43 48 100 148

IEI

tr

iVcpl

l1>cpl

36 66 88 154

4 1 4 5

6 4 12 16

0 3 0 3

tr _

# transactIons

Figure 10: Application examples

Tab. 10 shows some dates about four large application examples. In all four examples the graph complexity is dramatically reduced during the construction phase.

Algorithm: Within each iteration step different alternatives of changing the current solution are inquired. An alternative must improve the current solution, and must be valid, i.e. all deadlines must be met, which is checked by Altenbernd 's schedulability test (1995) [1]. An alternative is either a move of a task to a neighbour processor (graph bisection) , or a change of a processor attribute from SD to OD , or vice versa. The alternative with the lowest costs is selected in each computation step. The algorithm runs until no further improvement is possible.

id t yc byc myc hyc

CoP

ImP

+/-

t-CoP

t-ImP

600 301 1200 1501

460 301 899 1310

-23'70 0% -25% -13%

0 .2 0 .3 1.8 4 .1

15 .6 3.7 260.0 412 .0

Figure 11: Experimental results of the algorithm

In Fig. 9 an example is shown. The current costs are 183. There are three possibilities for changing the processor attribute: PO to SD (new costs: 233), PI to OD (deadline violation due to additional OS overhead), and P3 to SD (250) . Hence changing a processor attribute will not improve the current solution . There are two valid possibilities of moving a task to a neighbour processor : A to PI (133 ), and C to PI (150) . Hence the alternative of moving A to

The corresponding experimental results of our algorithm are listed in Tab . 11. As a result of the graph reduction the construction phase needs only few CPU-time, represented by column t-CoP. Most of the CPl.; time is consumed by the improving phase (column t-ImP). In column CoP the results of the construction phase are listed , representing a purely constructive approach . These values are improved in

202

the second phase. For the second phase hill climbing seems to be more appropriate than simulated annealing.

most cases during the second phase, as shown in column ImP (column +/- gives the percentage of reduction) . The improving phase is more helpful in quite loosely connected example graphs.

ImP

b-costs

tyc

460

bye

301

605 +32% 978 +225%

Id

myc

899

-

hyc

1310

1412 +8%

SIA 674 +47%

2100 +134%

-

H~

521 +13% 301 0% 2400 +167% 1772 +35%

2.SIA 600 +30% 301 0% 1131 +26% 1293 -1%

5

CONCLUSION

This paper is based on the advantages of configurating a parallel real-time OS a-priory. It enables a parallel real-time OS to manage real-time constraints of any granularity ranging from no-real-time to hard real-time. The goal of the configuration is to reduce OS overhead as much as possible. It is achieved by allocating the controller tasks to the processor network in advance. Following this approach the advantages of special purpose processors (no overhead) and OS driven computers (good utilisation) are combined .

Figure 12: Comparing different methods

In Tab. 12 our method (column Imp) is compared to a number of other techniques. Compared to a purely OS-driven balanced solution (the traditional way) represented by column b-eosts (delivered by simulated annealing, as far as possible) our new method represented by column ImP enhances effectivity in all examples, especially in the fine grained example bye. This means our understanding of configuration allows running the same task with less occupied resources.

For this purpose this paper has introduced a new two-step allocation algorithm. The algorithm combines the advantages of constructive, and non-guidedsearch approaches to overcome the problems of purely constructive, or purely non-guided-search techniques. The construction phase uses CPM to reduce problem size, whereas the hill-climbing-driven improving phase is based on graph bisection. As emphasised by the experimental results the heuristic delivers good values within short response times that allows practical use within large applications .

The remaining columns represent configuration in our sense, i.e. it is tried to minimise the same cost function with different algorithms. In columns four and five two purely non-guided-search methods are shown: simulated annealing (SiA) and hill climbing (HiC) , which works similar to the improving phase. Both methods start from an initial solution where each subgraph is assigned to a separate processor. The runtime of simulated annealing was limited by the runtime of our method. Within examples with large subgraphs simulating annealing fails in finding any solution, due to its difficulties with the problem structure as explained in Section 2. In the two other cases the results are much worse than our results, which sometimes can be improved by further computing time. The hill climbing heuristic found a solution in all cases, sometimes with great success (bye) , and sometimes not (mye) .

For the future work the algorithm may be refined with regard to several aspects. First, when joining two paths during the graph reduction substep the very coarse slack-based condition can possibly be made more accurate. Second, when reducing the size of the CP-graph the corresponding cost function may be improved with more regard to the following mapping step. Furthermore, the quite primitive hill-climbing technique of the second phase could be replaced by a more sophisticated approach like tabu search . However , using a hybrid method seems to be the best solution in any case. And finally, aspects like replication , or task attached to a particular processor must be taken into account.

Comparing simulated annealing and hill climbing to each other it seems as if simulating annealing finds better results, if it finds any, and hill climbing tends to be more reliable.

The work presented in this paper will be used in the mechatronic project "METRO" granted by the German Ministry of Education and Research (BMBF) . Within this project our new technique will be used in systems described by Petri nets (Rammig (1993) [11]) and differential equations , to configurate the parallel real-time OS "DReaMS" by Ditze (1995) [5].

Finally, as presented in the last column ( 2. SiA ) the hill-climbing-driven improving phase was replaced by simulated annealing (time limited as above) . The results show that this sometimes produces equal values , but also sometimes produces worse values . Hence, hill climbing also outperforms simulated annealing when it is used for the improving phase . The experiments with different algorithms show that our hybrid approach gives better results than the nonguided-search techniques simulated annealing and hill climbing. The results of the first phase (a purely constructive method) are improved in most cases during

Acknowledgement The authors would like to thank ProL Alan Burns , and Mark Nicholson (University of York) for some helpful comments on an earlier version of this paper .

203

6

REFERENCES

[1] Peter Altenbernd. Deadline-Monotonic Software Scheduling for the Co-Synthe6i6 of Parallel Hard Real- Time Systems. In Proceedings of the European De6ign and Test Conference, pages 190195, 1995.

[2] N. C. Audsley, A. Burns, M. F . Richardson, and A. J . Wellings. Hard Real-Time Scheduling: The Deadline Monotonic Approach. In IEEE Work6hop on Real- Time Operating Systems and Software, 1991. [3] J. Cordsen and W . Schroeder-Preikschat. Towards a Scalable Kernel Architecture. In Proceedings of the Openforum Technical Conference, pages 15-33, 1992.

[4] R. Diekmann, R. Liiling, B. Monien, and C . Sprii.ner. A parallel local-search Algorithm for the k-Partitioning Problem. In Proceedings of 28th Hawaii International Conference on System Science, 1994. [5] Carsten Ditze. DReaMS - Concepts for a D )istributed Rea)l- Time M)anagement S)ystem. In Proceedings of the IFAC/IFIP Workshop on Real- Time Programming, 1995. [6] A. Gheith and K. Schwan. CHAOS-ARC: Kernel Support for Multiweight Objects, Invocations , and Atomicity in Real-Time Multiprocessor Applications. IEEE Transactions on Computer6, 1(11}:33-72 , February 1993. [7] R. B. Hitchcock. Timing Verification and the Timing Analysis Program. In Proceedings of the 19th Design A utomation Conference, pages 594603, 1982.

[8] B. W. Kernighan and S. Lin. An Effective Heuristic Procedure for Partitioning Graphs. The Bell Systems Technical Journal, (1}:291308 , 1970.

[9] D.-T . Peng and K. G. Shin. Static Allocation of Periodic Tasks with Precedence Constraints in Distributed Real- Time Systems. In Proceedings of the 9th International Conference on Distributed Computing Systems, pages 190198 , 1989. [10] K. Rama.mritham.

Allocation and Scheduling of Complex Periodic Tasks. In Proceedings of the 10th International Conference on Distributed Computing Systems, pages 108-115, 1990.

[11] F . J. Rammig. System Level Design. Nato Advanced Study Institute , 1993 . [12] K. Tindell, A. Burns, and A. Wellings.

Allocating Real- Time Tasks (An NP-Hard Problem made Easy) . Journal of Real- Time Systems,

4(2) :145-165, 1992. [13] J . Xu .

Multiprocessor Scheduling of Processes with Release Times , Deadlines, Precedence , and Exclusion Relations. IEEE Transactions on Software Engineering, 19(2}:139-154, February

1993.

204