A Message Based DCCS*

A Message Based DCCS*

Copyright ' IFAC Distributed Computer Control Systems 1983 Sabi -Sabi . South Afri ca. 1983 SESSION 3 REAL TIME ISS UES A MESSAGE BASED DCCS H. Kop...

2MB Sizes 8 Downloads 127 Views

Copyright ' IFAC Distributed Computer Control Systems 1983 Sabi -Sabi . South Afri ca. 1983

SESSION 3

REAL TIME ISS UES

A MESSAGE BASED DCCS H. Kopetz, F. Lohnert, W. Merker and G. Pauthner Institut fur Technische Informatik, TU Berlin , Federal Republic of Germany

Abstract:

MARS (Maintainable Real Time System) is a project on

distributed real time computer control which has as its goal the development of such a system from the point of view of maintainability and reliability in hardware and software. This paper presents the architecture of ~ARS and introduces by a number of examples the programming primitives of MAPS. The facilities for reI iabil ity and functional enhancerr:ent of a ~l.aPS system, as well as the possibility for the dynamic reintegration of repaired components are also explained. Keywords: Maintenance, Reliability, Peal Tirr:e Systems, Embedded Interprocess Comrr:unication, Systems, Di str ibuted Systems, Redundancy This work has been supported by the German Ministry of Research and Technology

1.

(B~FT)

under Research Contract IT 1018.

- Functional enhancement: A cessful system changes

INTRODUCTION

sucthe

The high costs for the maintenance of real time systems is a subject of widespread concern. According

environrr:ent which a g ain changes the requirements for this

to / DeRo 78 / the cost of keeping a succes sful real time s y stem in a sta te which is relevant to its users considerably surpasses its

syste m in a state which is relevant to its user s it is

systell'.

Repair

is

th e ha rdwar e

detected

to

keep a

tions.

necessary

We feel that these maintenance activities should not be dealt

fails and

with only after the system is successfull y installed b ut should be a determining factor during the

the design (software) contains design faults which have not been

order

necessary to repeatedly modify and enhance the sys tem func-

initial development cost. These maintenance costs are caused by - Repair: because

In

during the care-

system design.

ful preinstallation tests. 59

60

H. Kopetz et al .

The MARS (MAintainable Real Time System) project has set as its objective the design of an architecture and the prototype implementation of a Distributed Real Time System for process control applications from the point of view of maintenance and fault tolerance. In contrast to other projects on fault tolerant architectures /Wens 78,Hopk 78/ MARS is based on the availability of selfchecki)l'J C
System Architecture Any MARS System /Kope 82/ can be decomposed in a cluster and the environment (of the cluster). The environment can consist of the physical equipment which is to be controlled (plant) and/or other clusters. Thus frorr. the point of view of any cluster the rest of the system is considered to form the environment. The interconnection between a cluster and its environment is realized by interface components. The interface components translate the standard information representation in ~APS to the form required by the environment. cluster consists of a set of components interconnected by an intercluster communication system. A component is a self-contained ~

computer. All components have access to the global physical time. The intercluster communication system can transport messaqes between the components. A component

module. A module consists of a set of parallel tasks including a priority task which has the author ity to reset any of the other ta sks. ~ARS

2. INTERPROCESS

COM~UNICATION

2.1 Event and State Messages

In the analysis of real time systems it is helpful to distinguish clearly between event and state information. Eve nt in fo rmation deals with the occurrence of events (an event is a happening at a point in time). State information deals with attribute values of objects which are valid for a given time interval. State and event information are closely related - the change of an attribute value of an object is an event. In order to facilitate the exchange of event and state information between tasks, the concepts of event and state messages are introduced. Depending on the point of view taken by a receiver, a given message can be handled by one receiver as a state message, but by another receiver as an event message. The

difference

between event and

state messages concerns the handling of these two message classes at the receive~s end. The handling of event messages conforms with the "classical" message semantics. An event message is queued at the receiver when it arrives and dequeued when it is read.

consists of the MARS machine

- that is the computer hardware and executive software -, and the application software, called a

There are two real time values included in every message, the send time and the validity time.

61

A Me ssage Based DCCS

We assume ther e th at global real time base

e xists a in MARS

/ Lamp 78 / . In an event message the send time denotes the time of event occurrence while the validity time determ ines upto which point in time this event message is relevant. After this time the event message is discarded by the communication system.

update o f an

update of a

imm e diate sta t e

con si s t ent sta te 4 me ssag e a t t

mes s a g e a t trf+l

I

n

L=:1~~~.~~~~m~ 3 2

4

n

n

t1

t

1

t

n

n

1t

Version n

Ve r s i o n n+ 1

t~+ 1

State messages refer to attribute values of an object which are valid in the interval (send time, validity time). State messages are not queued at the receiver. On reading, state messages are not consumed, i.e. the valid version of a state message can be read several times. In MARS every state message has to be further classified as either cons istent or imll'ediate. This distinction is based on the point in time when a new version of a state message will replace the p resent (valid) version. If a state message is declared "consistent", new version will a re p lace a valid version onl y after this valid version has lost its If a state message is val idi ty. declared "immediate" a new version will immediatl y update the present v ersion of the state message. In a distributed real till'e system it is some times not possible to achieve consistency and speed si mul ta neo usl y . is up to the It p rog ra mmer to make a decision in a given application environment.

Real t i me

t'

send t i me

t 2 a rr ival t i me

t 3 cor r ec t ed v alidity time ( -1 ti c k ) 4 va li d ity time t

Figure 2.1 Difference between consistent and immediate state messages The synchronization between sender and receiver is different for event and state message exchanges. Not considering exceptional conditions, the receiver has to read every event message sent by the sender and vice versa, Le. sender and receiver must operate at the same rate (tight synchronization). In a state message exchange this relaxed, requirement is i. e. sender and receiver can operate at different rates (loose synchronization) • It is our opinion that synchronization is th is loose sufficient for many information control exchanges in process in simpler s y s tell's, resul ting and increasing the inter faces autonomy of tasks.

2.2 Data Field In Real

Time Control Systems many

tasks are executed cyclically. For

H. Ko p e tz e t al.

62

ex ample, a periodically values

control reads

and

algorithm measured

the

outputs

valve settings or periodically scans

the requ ir ed an alarm task the

and

measur ed

cond i tion s. We consider it as an unnecessary restriction to require that the cycles of all these periodic tasks have to be synchroThis is the reason

why we introduced the concept of a cluster "data field", i.e. a set of consistent state messaqes. The data field is a distributed real time

At present some further investigations are being carried

out in order to develop protocols which will establish such a valid

variables and tests them for alarm

nized tightly.

trivial.

representation

of the state

of the cluster environment.

consistent

data field.

The

results of this analysis will be used to determine the time parameters

for the state messages, e.g.

at what time should a state message become invalid. They will be used in the MARS software development system to support the design of real time programs. Considering the semantics of the consistent state messages, any component can read the data from the data field with loose synchronization, e.g. if a plant operator wants to look at a certain plant variable he has to access the

J

Sfl 1

t 1 1

corresponding

1

I

SM 2

field and can update his display according to his requirements.

t~ t~

t 1

2

element in the data

S fl 3

t~ t~

t1 3

t 1 t

2

State Message

send time

t

3

arrival time :4

co r r ected v a lidity t i m" ( - 1 ti c k ) validity time

Data fie l d consist i ng of S !·l 1, only du rin g (t~ ,

tll

S~

size

of

the

data

field is

delimited by the channel capacity of the transmission medium and the

Real t i me SM

The

S~ :

2 and

3 is

frequency data

of

field.

estimate

the

updates of the

Fig.

of

2.3

gives an

the size of the data

field.

Figure 2.2 Consi s tency of the Data Field bytes 100000

J

Si ze of Data Field

At any point in time a variable of

10 t-lbi t

the

10% Uti li z a tion

data

value

field either contains a

which refers to a valid and

consistent time

(in

relation

to real

and version) attribute of an

entity of the environment or it is undefined field

(Fig

2.2).

The

10000

I

~ ooo ~

I

100 ~----------

da ta

10

Pe r iod of update __- -- -__-----.-----,~

0.1

0 . 01

seconds

is constructed with the a id

of periodic

state 8 essages.

p roblem

esta bli s hing

and

Channel

of

consistent

The

a valid

a 2ta field is non

Fiqure

2.3 Es timation of the size

of the Data Field

A Message Based DCCS

The detection of errors of the data field (e.g. missing messages) is performed ~aintenance

A

by an component.

component

autonomous

which

can read and write from/to the data field is ( in called an active component relation to this cluster). P. component which is only allowed to read does

63

called a module.

Since the f unction of a component is determined by the application software, the

module identification consists of a function na~e and a version name. The MARS system supports a one-to-many communication pattern. ~odules can be members of one or more grou p"- .

data from the data field but not have the authority to

MODULE function. version MEMBER OF group;

write into the data field is called a passive component. l3ecause of the naming conventions a

END function.version.

passive component can be added to a cluster without any interference with This

the function of the cluster. characteristic of the MPRS

Architecture

is very useful for testing purposes. Any operational

cluster can serve as a testbed for a new component.

Each message addressed to a particular group will be delivered to all modules in this group. The sender

of a message does not know

the number of receivers.

External message declaration: 3. PROGRAMMING The

MARS-Project

is

not

only

concerned with the development of an architecture for a MAintainable-Real-time -System, but also with the design of appropriate language constructs and a prototype

implementation.

For

this

purpose we decided to extend an available programming language. We have

chosen

language ta sks.

PPSCAL

as

a

base

for programming the MARS

Every MARS message consists of a predefined message header and an optional user defined record. The message header contains the following information: - name; the message name - version; the send ing module/task version name - sender; the sending module / task function name receiver; the name/groupname of the rece i vi ng ~odule / task

- sent;

the send time

3.1 Declarations

- valid;

the validity time

Module declarations:

At the module

As mentioned above, the application software of a component is

can

oce-c·

this

be ginning of a module -the header- all messages which

be

received and generated by

module from / to other modules

H. Kopetz et al .

64

have to be declared.

We call the

attribute

INTERVAL must be speci-

messages,

declared in

fied

EVENT

the

which

are

module header,

ex ternal mes-

sages.

for

interval the

messages.

declaration

detection

messages.

This

is used for

of redundant event

This interval is opened

with the arrival of a message. The MODULE function name; IMPORT inmsg

interval

RECORD •.• END;

EXPO FT ou tmsg;

is

interval

redundant

ceives

'function name

re-

all messages with the name

and

are

considered

only a single mes-

sage

is delivered to the applica-

tion

software.

arrives module

the

messages which arrived within

this

The

after

specified duration. All but one of the

END function name.

closed

after

A

message which

the

closing of an

interval will open a new one.

The

state

are

and

event

explained further

semantics

in section 2.1.

'inmsg' with a user defined record and

it

generates

'outmsg' . only

messages named

This 'outmsg' contains

the

message header,

i.e. it

is a pure signal message.

successful assigns

input

operation

the

actual message,

clud ing

the

header,

message

var iable

to

with

in-

a local the

same

name as the message. Task declaration: MODULE function_name; As mentioned before,

every module

consists

of

of

a

set

IMPORT push_button

parallel

tasks. In the header of each task,

RECORD ••• END; IMPORT posi tion

all messages which can be received and generated by this task have to be declared. remain

If

within

call

them

Internal

the

messages

the component, internal

messages

are

RECORD .•• END; EXPORT action

we

messages. only

RECORD .•. END; TASK task name;

de-

STATE posi tion

=

clared in the task header, but not in the module header.

RECORD ... END IMMEDIATE;

EVENT push_button RECORD .•• END INTERVAL duration;

Internal message declaration:

EXPO PT ac tion

= RECORD Each

internal message declaration

declares

a task local variable of

the same type as the message. addition EXPORT,

END task name;

In

to the message attribute

END function name.

input messages have to be

classif ied EVENT

•.. END;

as

messages.

ei ther

STATE

or

Furthermore

an

This

module

receives one or more

A Message Based DCCS

messages

'push_button' and 'posi-

tion'.

The

clares diate

state

input

operation the

message

'time'

to

a

and with an

local

is

treated

occur

are

variable

the interval

considered

as a

single event. In addition the task sends a message

expression

or absolute).

time

the

discarded

'push_but-

time

by

After

message will be the communication

system.

'push_button'

during

'duration'

this

a

as an event and

one or more messages which

is

(relative

'position' is

'position'. The message ton'

task name, or a task group name.

task 'task name' de-

the 'position' as an imme-

assigned

65

'action' to anoth-

er task.

~t

execution

ment

the

of the OUTPUT state-

message

structed

from

header is con-

this

information.

The

send time and the sender name

are

inserted

into

the header by

the MARS machine. 4.

INPUT and OUTPUT Statement 4.2 The INPUT-Statement

Beca use

the

semantics

interprocess ments

in

of

the

communication state-

MARS

differ

from the

with the INPUT statement a message can be read from the communication

semantics normally associated with

system

send

variable.

new

and receive statements, keywords,

INPUT

two

and OUTPUT,

are used in MARS.

into

receive sages

a task local message It

the

is

from

a

all

' selective queued

mes-

denoted one(s) will be

selected. 4.1 The OUTPUT Statement The

OUTPUT

statement

The general form is: is used to

output a message to the communica-

INPUT

tion system. The issuing task then

msgll FILTER fll

proceeds

(i.e.

AND

performs

a

local

part

system) • requires

the

issuing task

rendezvous of The

the

with

msg12 FILTER f12

the

the communication OUTPUT

AND ...

=> stmtl

statement

specification

of a

OF msg21

validity time for the message.

=> stmt2 OUTPUT msg TO receiver VALID time; AFTER time 'msg' is a declared message that

=> stmta

END

will be transmitted to the receiver', name,

that is a component

a component group name,

a

'msgij'

is

a

declared message

that should be received

66

H. Ko p e tz e t al .

-

'fij' denotes a predicate on the full contents of 'msgij', (i.e. the message header and the userdefined record), and task local msg ij FILTER f ij , var iables. means that only if the evaluation of 'fij' delivers 'true'

collection bin - lever position workpiece in process (i.e. a work?iece has interrupted the lightbeam and is not in a collection bin)

this message will be read. 'msgij ••• AND msgi k ••• ' dete rmines that receiving is only possible if all listed messages are available. - the keyword 'OP' denotes message rec e ption

alternatives.

- if no message(s) can be read until a specified point in time , t he i LN' cut alternative ('AFTEP time' ) i sex e cut e d. if receiving is possible the selected message(s) will be - for an event message consumed and assigned for a state message only assigned to the local message variable specified in the declaration. The following statement 'stmti' will then be executed.

Figure 5.1 Let us assume the conveyer. ar e relevant.

the minimal time between two war kpieces the time needed to sw itch the lever the time needed to transport a workfrom piece the

- succession;

- switch; - transport;

lightb e am lever

5. EXAMPLES

Consider a conveyer wh ich is used to transport workpieces. The workpieces

s ta te

of

by

-

number

of

of

large

total

the

the

succession > transport and

th is 'plant' can de-

scribed tion:

to

It is required that the conditions

have to be counted and sorted according to size (Fig 5.1). The

a constant speed of The following times

switch

« trans p ort

following informaat most one are fulfilled, i.e. workpiece may be between the lightprocessed work-

beam and the conveyer end.

pieces - number

workpieces

in

The

controlling

software

has

the

A Message Based DCCS

follow ing tasks

general

are

structure

implemented

(all

TASK size lever_position

'loop

as

forever'):

67

MEMBER OF piece_processor; EVENT workpiece INTERVAL jitter; INPUT workpiece

=> if new_pas then switch lever

TASK lightbeam; EXPORT workpiece

=

RECORD size

AFTER succession SEC; :

(large,small)

END;

wait_for_workpiece;

OUTPUT size lever TO •.• END size lever_position;

determine_size_of_workpiece; OUTPUT workpiece TO piece_processor VALID transport;

The

task

'size lever_position'

changes the lever position if necesEND lightbeam;

The

ta sk

message

sary.

' lightbeam'

produces a

Functional enhancement:

'workpiece' if the lightbeam

is interrupted. The contents of this

Very often after the installation of

message

a system new requirements have to be

describe

the

size

of the

workpiece.

implemented.

As

criteria

weight

pieces T~SK

count MEMBER OF piece_processor;

EVENT workpiece INTERVAL jitter; EXPORT total number

=

the has

weighbridge

a third selection

to

be

and

a

of

the war k-

cons idered. second

A

sorting

lever must be installed.

RECORD ••• END;

INPUT workpiece => management_info AFTER succession SEC; OUTPUT total number TO END count;

The

'count'

task

'workpiece'.

information more,

processes

the

Further-

it sends at least all succes-

sive

seconds

i.e.

the

its

content

internal of

the

state, message

Figure 5.2

total number. This functional enhancement requires the following software additions:

The predicate jitter

«

succession

is always true. This must be guaranteed by the plant process.

H. Kopet z et aZ .

68

TASK weighbridge;

The

addition of this redundant task

is simply a 'plug in'.

=> ..•

INPUT workpiece

semantics on

the

The interval

receivers side of

AFTER succession SEC;

the

message(s)

'workpiece' supports

OUTPUT weight TO

the

insertion

of

piece

multiple

'work-

senders.

END weighbridge; Under the assumption that the lightbeam TASK weight_processing; INPUT weight

is

self-checking

1 ightbeam

=> ...

can task

'workpiece'

messages

event

OUTPUT weight_lever TO

completeness of

ta sks

system

c an

be

added

to

th e

duces

without interfering with the

tion.

the

by

and

support

(function,

tion) by the MARS Both

by a

a

simple

checks

for

the messages 'work-

This

viewpoints

failing

which accepts all

declaration

piece'.

a

detected

maintenance AFTER succession SEC;

END weight_processing;

be

of multiple error detec-

architecture

complexity

re-

of the solu-

ex isting system. The

addition

functional Reliability improvement:

duced ent

Let shown the

us

assume, in

lightbeam

bottleneck. dant

that

it

has been

practical operations that is

a

reI iab i l ity

A second, i.e.

a

task with only

behavior

(i.e. the pro-

output at time t from

easily internal

the

is independ-

work done before)

achieved. state

Task s can

also

with

is an

be added

with the following scheme:

a redun-

lightbeam has to be installed.

An additional task 'lightbeam redundant'

of

(only

TASK one of multiple_copies

a copy of 'lightbeam ')

has to be included.

MEMBER OF copies; STATE internal =

-

redundant lightbeom

RECORD my_state

structure END;

initialisation) INPUT internal

=> { recovery start)

AFTER (cycle + trans)

SEC

=> { initiali s ation start)

loop forever) INPUT request

=> perform_service

AFTER cycle SEC; OUTPUT internal TO copies VALID cycle; Figure 5.3

A Message Based DCCS

In this scheme the task

one of mul-

tiple_copies'

after

produces

request a message message

holds

'internal'.

the

internal

information of this task.

each

in such

reliability

each

seconds.

requests. phase

In

the

the

that

becomes

the a

overall maximum.

6. LITERATURE

The

restart of redundant tasks is thereof

way

Further-

state

independent

a

state

the task outputs its internal

fore

This mapping has to be done

This

more,

'cycle'

ware) .

69

rate

of

initialisation

a new task waits on a message

/DeRo 78/ De Roze, B.C., Nyman, T.H.: The software life cycle a management and technological challenge within the department

'internal'.

After receiving such a

of defense, IEEE Trans. SE-4,

message

internal

July 1978, p.

other

the

(active)

task is known.

'internal'

message within

the

state of the

must

If no

can be received cycle + trans

duration

then

it

task

is the only one in the cluster

be assumed tha t th is

producing a message

309-318

'internal'.

/Hopk 78 /

Hopkins, A.L., Smith,

T.B., Lala, J.H., FTMP - A highly reliable Fault Tolerant ~ultiprocessor

For Aircraft,

Proc. IEEE, vol.

66, No. 10,

Oct. 78, p. 1146-1154 However, only

this

if

assumption

the

following

is valid condition

holds

/Kope 82/ Kopetz, H., Lohnert, F., Merker, W., Pauthner, G., The Architecture of MARS, Report MA

restart> 2 ( cycle + trans ) -

restart; the minimum duration tween

- cycle;

the

start

82/2, TU Berlin, April 1982 be-

of two

an integrated approach to

the

distributed computer control

maximum duration bethe sending of two

messages 'internal'. trans ;

Slowman, M., Lister, A., CONIC:

r ed u n d ant ta s k s . tween -

/Kram 83/ Kramer, J., Magee, J.,

the

maximum

transport

a

time

systems, lEE PROC., Vol. 130, Pt.E, No.

1, January 83, p. I-In

to

message be-

/Lamp 78/ Lamport, L., Time, clocks

tween the redundant

and the ordering of events in a

tasks.

distributed system, CACM, Vol. 21, No. 7, July 1978, p. 558-565

Until

repair

of the

of a failed component

system a redundant component

is still available.

/LeLa 79 / Le Lann, G., An Analysis of Different Approaches to Distributed Computing, Proc. 1st

Mapping the task to computers:

International Conference on Distributed Computing,

Because in MARS there is n o distinc-

Huntsville, October 1979,

tion

p. 222-232

ule)

between and

internal

munication mapped

external (module-mod-

all

(task-task)

com-

tasks can be freely

to MARS machines (i.e. hard-

70

H. Kopetz

/Svob 79/ Svoboda, L., Reliability issues in distributed information processing systems, Proc. of the 9th International Symposium on Fault Tolerant Computing, June 1979, p. 9-16 /Wens 78/ Wensley, J.H., Lamport, L., Goldberg, J., Green, Levitt, K.N.,

~LW.,

~elliar-Smith,

P.t-l., Shostak, R.E., weinstock, C.B., SIFT: The Design and Analysis of a Fault-Tolerant Computer for Aircraft Control, Proceedings of the IEEE, Vol. 66, No. p.

10, October 1978,

1240-1254

et al .