Copyright © IFAC Real Time Programming, Fort Lauderdale, Aorida, USA, 1995
MODELS OF REPLICATION FOR SAFETY CRlTICAL HARD REAL TIME SYSTEMS P.A.BARRETT", A.BURNSt AND A.J. WELLINGSt " University of Newcastle upon Tyne, Centre for Software Reliability, Bedson Building, Newcastle NE1 7RU, U.K. t University of York, Department of Computer Science, Heslington , York, Y01 5DD, UK
Abstract. Few non-static replication techniques used for tolerating processor failure are directly applicable to hard real-time systems where timing constraints are paramount and must be guaranteed by a priori analysis. In this paper we extend a common paradigm for implementing hard real-time systems, so that various models of replication of periodic processes can be accommodated and analysed. Keywords. Hard Real-Time, Fault Tolerance, Replication, Schedulability Analysis
1. INTRODUCTION
accommodate event-driven and time-triggered computation, and is implemented via run-time scheduling. With this scheme, different parts of the application can be supported by different levels of replication. Indeed replicated and non replicated components can co-exist.
There is an extensive literature on techniques for tolerating hardware failures within computing systems. Unfortunately few of the more flexible techniques described are directly applicable to hard real-time systems where timing constraints are paramount and must be guaranteed by a priori analysis. In this paper we extend a common paradigm for implementing hard real-time systems, so that various models of replication can be accommodated and analysed.
The rest of the paper is organised as follows. Section 2 introduces the computational model and analysis that defines our scheduling scheme. Section 3 describes our replication model and the assumptions made concerning it. Replication of periodic processes is considered in section 4. The properties required of the communication media are discussed in section 5. Finally, our conclusions are presented in section 6.
The exploitation of replication is a common means of obtaining fault tolerance. If the failure hypothesis is such that fail-silent 1 behaviour can be assumed, then replication alone may be used to increase availability (i.e. n replicas will survive n - 1 processor failures). In more extreme situations, where fail-uncontrolled behaviour must be tolerated, replicated execution followed by voting is used. With both of these schemes there is a requirement to maintain replica determinism (i.e. in the absence of faults , replicas should behave in an identical manner) .
2. COMPUTATIONAL MODEL In order to predict the worst case response time of a real-time application it is necessary to enforce a computational model , for the system 's software, that will facilitate effective analysis. The following outlines such an approach. The model described underpins a number of structured methods for the design and implementation of (non-replicated) real-time systems; for example MASCOT (Bate 1986) and HRT-HOOD (Bums 1995) . It is also at the heart of the formal design method TAM(Scholefield 1990) . The model is particularly effective when it is combined with fixed
Where the hard real-time system is time-triggered (Kopetz 1991) and is implemented as a static schedule, replication based upon lock-stepped behaviour of identically loaded processors is possible. The MARS system (Damm 1989) illustrates this approach. In this paper we consider a more flexible scheme that can
1 A fail-silent processing node (Powell 1988) is one which contains self-checking mechanisms which ensure, with high probability, that it either performs correctly, or becomes silent thus never allowing erroneous outputs to be propagated. A fail-uncontrolled node does not contain such extensive self-checking, and can exhibit arbitrary behaviour on failure .
181
priority scheduling (see section 2.1); it is, however, independent of the scheduling approach.
pending the process, and would require the underlying execution environment to support a remote procedure call mechanism. Although such mechanisms exist they are not usually amenable to analysis in terms of their effects upon the timing behaviour of both the communications network and the remote processor. By restricting the model, all that is required from the execution environment is an asynchronous message facility that can place data in a remote resource and release a remote sporadic process for execution.
As real-time systems are by nature concurrent, repetitive and long-lived, it is assumed that the system is populated by single threaded processes whose behaviour is defined to consist of a (potentially infinite) series of invocations. At the end of an invocation the process is suspended. Tt becomes runnable again when its invocation event occurs. We distinguish between two types of process: periodic and sporadic. Periodic processes are released by a timer event originating from the target platform. Sporadic processes are released by an event originating from either another process or the environment of the system (typically as an interrupt). Note that events are considered to be 1-level persistent, e.g. a periodic process that overruns its next release time will not be suspended but will move directly into its next invocation.
The disadvantage of this restriction is that systems may prove difficult to distribute. For example, in an extreme case, if all processes were to read from one resource they would all need to be co-located with it. Systems are rarely so centralised, however, and where they are, simple transformations to the structure of the software can facilitate allocation. The key aspects of this model can be summarised as follows.
Processes have a number of associated attributes. In particular the time value, T, gives the period of a periodic process and the minimum inter-arrival time of a sporadic process. Sporadic processes may also have more global properties. For example, a sporadic may have a value of T of 1ms but may additionally have the restriction that no more than four invocation events can occur in any 10ms interval. Another important characteristic of any process is its worst-case response time. Given a scheduling scheme, response time represents the predicted latest possible process completion time (measured relative to its invocation event) .
1. The software system consists of Processes and Resources which form units of distribution. 2. Processes and resources are assigned attributes that designate key temporal aspects of their behaviour. 3. A process's behaviour consists of a potentially unbounded series of invocations; each invocation of the process is released by an event. A process must not voluntarily suspend itself during an invocation. 4. Periodic processes are released by local timer events. Sporadic processes are released by events originating either in other (possibly remote) processes, or from the environment of the system.
An assumption is made that the application consists of a bounded number of processes, N . In a distributed system processes form a unit of distribution, in that a process resides on exactly one node.
5. Resources provide mutually exclusive access to the data they protect. This data is shared between the processes. Processes may write to any resources, but can only read from local resources.
Inter-process communication is a key aspect of any computational model. The model assumes an asynchronous scheme as it forms a natural design abstraction and leads to implementations that can be effectively analysed. The means of exchanging data between processes is via a resource. A resource provides access to shared data but ensures that mutual exclusion is provided.
2.1 Scheduling Analysis. A scheduling scheme has two necessary aspects: the run-time behaviour it prescribes, and the analysis it provides for predicting temporal behaviour.
A software system conforming to the computational model described in this paper consists of processes and resources. Like processes, resources form units of distribution. In order to prevent a process from suspending itself while accessing resources, the computational model imposes restrictions on remote actions (i .e. actions from one processing node to another) :
Although there are obvious benefits in integrating the computational model and the scheduling scheme, it is important to distinguish between them. The computational model defines what form the architecture of the application must take. The scheduling scheme defines how the application is mapped onto the hardware platform and then allows its temporal behaviour to be predicted. An application conforming to one computational model may be scheduled in a number of different ways.
1. a process may read from or write to a local resource, 2. a process may write to a remote resource but may not read from one.
Although many existing real-time systems are scheduled via a static layout of all the computation
1'0 read from a remote resource would involve sus-
182
Fig 1. Replicated Writer, Reader and Resource and communication events (see for example MARS (Damm 1989)}, much recent research has focussed upon preemptive priority-based scheduling. This approach assigns to each process a fixed priority that is derived from its temporal characteristics. At runtime, each node executes the highest priority local runnable process. If, during the execution of a process, a higher priority one becomes runnable (e.g. it is released by a timing event), then the current running task is preempted.
value based upon the maximum value derived for each replica.
3. REPLICATION MODEL ANn ASSUMPTIONS When an application is required to be able to tolerate hardware component failures, the processes and resources which make up that application must be replicated, with each process replica executing on a different node within the system. Due to the constraint that resources must be local to the processes which read from them, it is clear that resources must be replicated to the same degree as the process which reads from them, with resource replicas being located on the same nodes as the appropriate process replicas. (Where a number of different readers read from a resource, that resource must be replicated once on each node which hosts replicas of any of the reader processes.) Thus, in the general case of replicated writers and replicated readers, we achieve the situation depicted in Figure 1. In this figure, P~ is the ith replicated writer process, is a reader process, and Rj the jth replicated resource.
Resources are implemented via a ceiling priority protocol. Each resource is assigned a priority that is the maximum of the priorities of the processes that use that resource. At run-time, when a process accesses a resource its priority is raised to that of the resource's ceiling value. This priority scheme provides mutual exclusion and minimises the interference between different priority levels (Sha 1990). Preemptive priority-based scheduling has the advantage that it can deal with more dynamic nondeterministic applications. Systems can, however, be analysed to predict their worst-case behaviour, and hence the obligation to meet timing requirements can be satisfied prior to execution. The analysis involves solving a set of equations to derive the worst-case response time, R, for all processes. These can then be compared with system deadlines. As the focus of this paper is replication, the details of the analysis will not be given. The reader can obtain details from the following list of references (Audsley 1993a, Tindell 1994).
P;
In systems making use of replicated processing, it will normally be required that all process replicas behave in an identical manner. This has two implications: 1. The code of each replica must be deterministic; that is, when presented with the same set of inputs all process replicas must process those inputs in the same way such as to produce the same set of results. 2. Each process replica must be presented with an identical, and identically ordered, set of inputs.
2.2 Implications for Replication The objective of this paper is to describe models of replication that are compatible with the computational model and scheduling scheme defined above. These models must ensure that, in the absence of errors, replicated processes must behave in an identical manner. Tt is, however, not necessary to insist that all replicas execute at exactly the same time. Replicas may be assigned to nodes with different computational loads; they may even be assigned different priorities. What is crucial is that all replicas have the same notional release time, and that all can be analysed to give their worst-case response times. A set of replicas will therefore have a worst worst-case
Achieving replica determinism requires that each process is constrained only to execute statements which have deterministic outcomes (Schneider 1990, 'fully 1990, Poledna 1994) . This paper addresses the second requirement, examining communications proto cols which will ensure that replicated processes receive identical input sets. In the case of systems of the type described, resource reads are constrained to be non-blocking. Thus, what must be ensured is that on any given read operation all replicas of the reading process receive the same
183
item of data from the resource being read. Since the timing of updates to resources and attempts to read those resources cannot be predicted due to the dynamic nature of the scheduling, this implies that, at any time, at least two versions of the data held in a resource must be available, and that all replicas of a reader must make the same decision as to which version to consume on each read. The communications protocols used must ensure that this is the case. Tn order for the correct decision to be taken at each replica, two conditions must be fulfilled:
the purpose of the fonowing discussion it is assumed that the timing of writes is such that only two values need to be available at any time, and that the reader's decision is, therefore, whether to take the latest available value, or the single earlier value. The proposed protocol is based around the timestamping of messages, initially based on process release times, as follows: When a (replica of a) writer writes to (the replicas of) a resource, it will timestamp its message with its own release time, plus its worstcase response time. Tn effect, this is stating that the data written to the resource becomes valid at the end of the writer's cycle. The protocol executed by the reader is:
1. All replicas must have a common frame of reference upon which to base decisions regarding which data item to consume. 2. A data item which is selected by any replica must be available at all other replicas at the point at which they make their own decisions in order that they can select that item.
if TSi + ~i +
f
<
Reader release time then
The problem which must be solved is that fying an appropriate common frame of between process replicas. Tn order to do as the basis of the proposed protocols, we following assumptions:
read latest data value
of identireference this, and make the
else read previous value fi
1. A single, global system time is not necessarily available, however, the maximum desynchronisation between the clocks of the individual processors within the system is bounded and known.
where T Si is the timestamp of latest data value for the message in question (message i), ~i is the worstcase message delivery time for message i to reach all replicas of the destination resource across the network using the appropriate delivery service, and € is the worst-case processor clock desynchronisation value; i.e. , a data item becomes stable at a replica of a reader process only when the writer's worst-case cycle time has expired, and any delays in message transmission and clock desynchronisation have been taken into account. Since all replicas of a reader process win have the same release time, and since message selection at the reader replicas is based on that release time, all reader replicas will make the same decision regarding which version of the data to consume.
2. The worst-case delivery time for each message sent between processors is bounded and known (Tindell1995) . 3. Each periodic process knows its own release time (release times are identical for all replicas) and timing properties (worst-case response time, etc.). 4. Each sporadic process knows its own timing properties, but not necessarily its release time. Given these assumptions, we can establish an appropriate common frame of reference based upon process release times and response times. Tn the following sections the basic protocol is defined, then a series of optimisations which increase its efficiency are identified and the communications support required to implement those optimisations is discussed.
Where the writer process, P"" is replicated, the protocol will be required to consolidate the results produced by the replicated writers into a single data value and timestamp to be placed into the resource. Given that all replicas of a process have the same release times, that there is a single worst-case response time for all replicas (equal to the worst-case over all replicas should each replica have a different actual worst-case response time), and that all replicas behave deterministically with respect to one another, all replicas will produce identical messages in the absence of failure. Where fail-silent processors are used within a system it may, therefore, be assumed that any message received is correct, and the consolidation mechanism consists of selecting the first replica of any message received to place into the resource, and discarding any further replicas which may be received. Where system processors are not fail-silent , the possibility exists that messages received may be
4. THE HASTC PROTOCOL FOR PERIODIC PROCESSES Since read operations must be non-blocking under our model, and since updates to replicated resources will not be atomic across replicas, we assume that resources can hold multiple versions (at least two) of the data contained within them, and that the protocol requirement is, therefore, that all replicas of a reader process should select the same version of the data for processing on any read access. Further, for
184
t_ release
VVriU7
worst case response time
••----lr~--------~~--------~1~;+~i , ,
.:
release'
, Reader
!-
"Staleness"
lREAD
.'
•
Fig 4a Stale Data using the Basic Protocol incorrect due to a failure having occurred within the system , and the consolidation mechanism must employ techniques such as majority voting to produce a single, validated result to be placed into the resource.
reality, been available for some time. Thus, the data actually read is stale by at least the amount shown in the diagram. Modifications to the protocol can reduce substantially the potential staleness of the data read from a resource, and a number of such options are now discussed:
Another type of failure which must be considered is those failures which may result in messages being lost; in particular, which may result in a message to a replicated destination resource reaching only a sub-set of the replicas of that resource. Were this to happen, it is possible that different replicas of the process reading from the resource may reach different conclusions regarding the version of the data to consume, and the requirement that all replicas of a process must behave identically with respect to one another would be violated. In the general case, problems due to message loss may be avoided only through the use of an atomic multi cast protocol (Cristian 1985) for message delivery. (Atomic multicast protocols are discussed further in Section 6.) In practice, however, if a writer is replicated to a degree appropriate for the number of failures to be tolerated (where the number of failures, n , includes those affecting the transmission of messages, and an appropriate replication degree is at least n+ 1 for systems based on fail-silent hosts and 2n+ 1 for systems based on fail-uncontrolled hosts) , message losses may be tolerated, and the need to employ atomic multicast communications protocols may be avoided.
(1) Timestamp writes with the actual time of write, rather than release time + response. If the writer timestamps its messages with the actual time of writing, rather than its release time + worstcase response time, data staleness may be reduced by an amount equivalent to the writer's worst-case response time minus its execution time between release and the point of writing; a potentially large value. With the reader using the same selection criteria as before, maximum potential staleness will be much reduced, as illustrated in Figure 4b. As before, fli is the time taken to transmit the data to all replicas of the receiving resource, and € is the worst-case processor clock desynchronisation value. The protocol is complicated in this case by the fact that, where the writer process is replicated, different timestamps will be generated by each replica since each will execute the write at a slightly different time, and the rationalisation process applied to the replicated data values must be extended to the generation of a single timestamp which may be placed into each of the replicated resources. Since the timestamp generated must be consistent at all resources, it is further implied that all resource replicas must have the same set of messages upon which to base their rationalisation process, and thus that messages must be transmitted using an atomic multicast protocol.
4.1 Performance Issues The protocol as it stands is effective in that all reader replicas will select the same version of the data on each read access to the resource, thus preserving replica determinism . It is, however, the case that readers may be forced to use old data long after the new data has, in reality, become stable. Consider, for example, the scenario shown in Figure 4a.
[2) Timestamp based on a more detailed knowledge of process timing properties. Given that we already need either to calculate or to m easure process worst-case response times, it is unlikely to be substantially more difficult or time consuming also to measure response times at each read or write point within a process. What is required, for each process, is a knowledge of the worst-case response time at each write point and the best-case response time at each read point . (Tt is assumed that all replicas of a process will have the same timing properti es. Tf this is not the case, we require the over-
In the above example, the writer (W ) executes the write fairly early in a relatively long worst-case response time. The reader (R) begins execution around the time the writer completes, and reads the data some time later. Under these circumstances, however. Wreleas.
+ WT •• pons. + fli + € > RT elea3e
thus the reader will be forced to read the old value of the data despite the fact that the new value has, in
185
Writer
release
worst case response time
~E ., . , !READ .. .' lease'
'
Reader
I(
,
•
, "Staleness" '
Fig 4b Reduced Staleness Using Actual Time of Write and Best Case Time of Read all worst-case/best-case times for all the replicas.) Given this information, the protocol is modified as follows. A writer process, on writing to a resource, will timestamp the data written with its release time + worst-case response time at the point at which the write occurs. When the reader comes to read the data, it will take the new value of the data if:
timestamp + ~i + £ < RestCaseResponseTime
TtsOwnReleaseTime
more detailed knowledge of timing properties within the reader. This eliminates (worst-case time to write - actual time of write) from our calculation of wasted time, leaving it as (actual time to read - best-case time to read), and represents the optimum protocol where an atomic multi cast protocol is available (see Figure 4d).
+ 5. COMMUNICATIONS SERVICES AND ATOMIC MULTICASTING
at the point of reading, otherwise it will take the old value of the data. The effect on potential data staleness is shown in Figure 4c.:
Various of the enhancements to our basic protocol which were discussed in Section 4.1 above require, or are substantially enhanced by, the use of an atomic multi cast communications protocol. This is a protocol in which, when a message is sent to a number of destinations (the replicas of a replicated process), it will either be received within bounded time by all of those destinations, or by none of them. Atomic multicast protocols are usually based upon some form of two-phase-commit mechanism.
that there is little we can do Given (within the protocol) about ~i and E, the time "wasted" within the protocol is reduced simply to (WorstCaseTimeToWriteActualTimeOfWrite) + (ActualTimeToRead RestCaseTimeToRead). This may be substantially better than our starting point. There is, however, a question mark over how much we gain in practice by using this method. Given that processes can be pre-empted during processing we cannot, in the general case, state anything more than that the worst-case response time at any point within a process = worst-case response time for the process as a whole - worst-case execution time from the point in question to the end of the process's cycle. (If this condition does not hold the process cannot be guaranteed to meet its deadline.) Similarly, best-case response time would be equivalent to process release time + best-case execution time to the point in question, assuming the process is not descheduled in the mean time. In either case, the difference between best-/worst-case response times and actual times of reading/writing could be substantial when scheduling effects are taken into consideration , and thus the gains achieved may not be as great as they may first have appeared.
Ideally, an atomic multi cast protocol would be provided by the underlying communications system. If, however, the underlying communications system does not provide such a service, it is possible to emulate it. To illustrate the possibility for emulation, consider the following protocol based on that used within the Voltan system{Shrivastava 1992): 1. Each replica of a writer process Pw generates an appropriate output message and sends it to each replica of resource R, using whatever communications facilities are provided by the underlying communications system. Tt is assumed that these services are such that individual messages may not be sent if a replica of the writer fails. We do require, however, that in the absence of failure messages will reach their destinations within a known and bounded time. 2. Each replica of R will, therefore, receive a set of input messages from the replicas of Pw . These sets may not, however, be identical if failures have occurred. Thus, each replica of R will forward the messages which it has received to the other replicas of R.
Note that this version of the protocol can be implemented without the need for atomic multicasting; it may be of use where such a mechanism is not available. [3] A combination of [1] and [2]; timestamp messages with the actual time of writing, and make use of a
3. Each replica will receive, in the absence of fail-
186
t_
Writer
worst case time to write
release
Reader
...
··
"Staleness"
Fig 4c Stale Data Using Wost-Case Write and Best-Case Read
Writer READ
Reader
•
•
•
I
"Staleness"
Fig 4d Staleness using Actual Write Time and Best-Case Read Time ure, one copy of each message from the original sender, and one copy from each of its own replicas. Tn the presence of the majority of single failures this is sufficient to ensure that each replica of R will see each message at least once. If, however, it is required that all possible single failures should be tolerated, a second round of forwarding is required and each replica should forward copies of the messages to any replicas from which they did not receive a copy during the first round.
this paper we have shown how replication techniques can be applied to real-time systems that have some level of dynamic behaviour (i.e. are not just time triggered cyclic systems). An abstract computational model is used to describe a general architecture for the components of a distributed real-time system. This model recognises periodic and shared resources; sporadic processes can also be accommodated but are not discussed in the paper. For many application structures, replication is straightforward. Where non-overlapping executions are used to implement precedence relations (i.e. periodic processes being released after the worst-case completion times of their predecessor) then replication does not cause difficulties. Where, however, there is asynchronous data transfer between processes then replica determinism must be enforced. We have shown in this paper how such behaviour can be ensured.
4. We should now have a situation in which no
single failure should be able to prevent any message which has been received by any surviving replica of R from being seen at least once by every surviving replica of R. Thus, each replica of R now has an identical set of inputs, and the replicas can reach a common decision regarding the contents of the message. The above protocol confers the properties of an atomic multi cast without the need for the underlying communications mechanisms to support explicitly atomic multi casting. Note that , for this protocoL ~i is the maximum time for the whole protocol to complete, including the rounds of forwarding and any preemption which may affect the replicas of Pw and the access mechanisms of the replicas of R whilst they are doing the forwarding itself.
Where communication is asynchronous it is possible for one replica to see a new value whereas others access the associated resource before the value has changed. Tt these circumstances the protocols ensures that all replicas read the old value . Although this may seem inefficient it is no worse than the unreplicated system where, for example, a periodic process reads a control value just before some sporadic process, released by the action of an operator, changes the control setting. The replicated and unreplicated systems must all wait until the period action is repeated before it can have an affect.
6. CONCLuSIONS
Replication is a standard way of increasing the availability and reliability of safety-critical systems. In
187
the communications requirements of each of our proto cols in terms of whether such a protocol is required:
Finally, some of the protocols we have identified require us to provide or emulate an atomic multi cast communications protocol; the following table shows
Protocol Basic Protocol Optimisation [1] Optimisation [2] Optimisation [3]
UnrepJ. Writer UnrepJ. Reader No No No No
UnrepJ. Writer Replic. Reader Yes Yes Yes Yes
Replic. Writer Unrepl. Reader No' No" No· No 3
Replic. Writer Replic. Reader No' Yes 4 No· Yes
Protocol Summary [1] . In the event of the failure of the (unreplicated) writer we require that either all reader replicas see the message, or that none do. An atomic multi cast protocol is required to ensure this. [2] . Since all writer replicas should generate identical messages and timestamps, lost messages can be tolerated provided that the degree of replication of the writer is sufficient. [3] . Although writer replicas may generate different t imestamps, there is no problem with consistency at the reader, since the reader is unreplicated. [4]. Since different writer replicas will produce different timestamps, and since reader replicas must all calculate the same timestamp value to put in the resource based on those different timestamps, all reader replicas must have the same set of messages upon which to base their calculations. RRFRRRNCFS
Systems, Digest of Papers, FTCS-18, Tokyo, pp. 246-251 , June 1988. Schneider, F.R. (1990), Implementing Fault Tolerant Services using the State Machine Approach: A Thtorial, A CM Computing Surveys, Volume 22, Number 4, pp. 229-319.
Audsley, N.C., Burns, A. , Richardson, M., Tindell, K , and Wellings, A.J . (1993) , Applying New Scheduling Theory to Static Priority Pre-emptive Scheduling, Software Rngineering Journa~ Volume 8, Number 5, Pages 284-292.
Scholefield, D.J. and 7.edan, H.S.M. (1990), TAM : Temporal Agent Model for Real-Time Distributed Systems, Proc. EUROMTCRO '90 - Sixteenth Symposium on Microprocessing and Microprogramming, Rditor D . Fay, Rlsevier Science Publishers B.V.
Bate, G. (1986), Mascot3: An Informal Tntroductory Thtorial, Software Rngineering Journa~ Volume 1, Number 3, Pages 95-102. Burns, A. and Wellings, A.J. (1995) , HRT-HOOD: A Design Method for Hard Real-time Ada, RIsemer.
Sha, L., Rajkumar, R. and Lehoczky, J. P. (1990), Priority Tnheritance Protocols: An Approach to Real-Time Synchronisation, TRER 'lhlnsactions on Computers, Volume 39, Number 9, Pages 1175-1185.
Cristian, F., Aghili , H., Strong, H.R. and Dolev, D. (1985) Atomic Broadcast: From Simple Message Diffusion to Byzantine Agreement, Digest of Papers, FTCS-1S, Ann Arbor, pp. 200-206, June 1985.
Tindell, K, Bums, A. and Wellings, A.J. (1994) , An Rxtendible Approach for Analysing Fixed Priority Hard Real-Time Tasks, Real- Time Systems, Volume 6, Number 2, Pages 133-151.
Damm, A., Reisinger,J ., Schwabl, W . and Kopetz, H. (1989) The Real-Time Operating System of MARS, ACM Operating Systems Review, Volume 23, Number 3 (Special Tssue) , Pages 141-157.
Tindell, K , Burns, A. and Wellings, A.J. (1995) , Analysis of Hard Real-Time Communication, Real-Time Systems, Volume 7, Number 9, Pages 147-171.
Kopetz H. (1991) , Time-Triggered versus Rvent Triggered Real-Time Systems, Proc. Operating Systems of the 90ties and Reyond, Lecture Notes on Computer Systems, Volume 563, Springer Verlag.
Thlly, A. and Shrivastava, S.K (1990) Preventing State Divergence in Replicated Distributed Programs, Proc. 9th. TRRR Symp . on Reliable Distributed Systems, Huntsville, pp. 104-113.
Poledna, S. (1994) Replica Determinism in Distributed Real-Time Systems: A Brief Survey, RealTime Systems, Volume 6, pp. 289-316.
Shrivastava , S.K. , Rzhilchelvan, P.D., Speirs, N .A., Tao, S and Thlly, A. (1992) , Principal Features of the Voltan Family of Reliable Node Architectures for Distributed Systems, TRRR 'lhlnsactions on Computers, Volume 41, Number 5, Pages 542-549.
Powell, D. , Bond, P. , Seaton,D.T. , Verissimo, P. and Waeselynk, F. (1988) , The Delta-4 Approach to Dependability in Open Distributed Computing
188