SOFfWARE FOR DISTRIBUTED CONTROL SYSTEMS III
Copyright © IFAC Software for Computer Control Madrid. Spain 1982
SYNCHRONIZATION OF CONCURRENT ACTIVITIES IN A LOCAL AREA COMPUTER NETWORK H. Rzehak Department of Computer Science, University of the Federal Armed Forces, USA
In local area computer networks it may be necessary Abstract. to synchronize activities wich are allocated to different computer systems of the network.
These networks can still
operate
in the case of a breakdown of a single computer with restricted performance. the
In this case it may happen that some activities in
working computer
(waiting for
systems
synchronization)
state because the signal activity
allocated
paper first for
still
to
and
remain they
in
a
blocked
state
never could leave this
for the release must be produced by an the
gives a brief
centralized computer
computer summary
systems.
system
which
fails.
of synchronization Then
it
is
shown
The
concepts how
these
concepts could be modified for an implementation on local computer networks.
area
It is described how to handle the problem of
blocked activities if a computer system of the network fails. Keywords.
Decentra1 i zed systems;
control; fau1 t
synchroni zati on;
concurrency
to1 erance; error recovery.
I NTRODUCTI ON
Today local
area
play an important
computer role
in
networks
control of
i ndustri a1
processes.
As
no commonly
accepted
definition
of
the term
"local
area
- The
computer
computer
the to
distance network maintain
transmission
net-
or
ri ng
between is
a fast -
the
nodes
sufficiently and
cheap
commonly
system
-
and
via
the
of
small data a
bus
standi ng
work" exists we consider two proper-
charges
ties inherent with this term:
load of the communication network.
Each computer of the network
is an
We assume
are
i ndependant
of
the
that only one organi zati on
autonomos (sub)system and can oper-
is responsible
ate for some tasks in a stand-alone
and
mode. This
easily how to coordinate the different parts of software in the subsys-
clearly
allows
to
dis-
tinguish between a multiprocessor and a local area computer network.
for
the whole
network
therefore agreements can be made
terns of the network. It is not neces-
171
172
H. Rzehak
sary to prefer those concepts for the coordination which work with a minimum of information transmitted over the network because this transmission doe s n 't a f f e c t any cos t sas far as the communication system is not overloaded. (C1ark, Pogran, Reed, 1978). The term "synchronization" is used to refer to the solution of either of two problems: (SI) specification and control of the joint activity of cooperating sequential process, or (S2) serialization of to shared access multiple processes.
concurrent objects by
Problem (SI) covers a characteristic situation in computer control of industrial processes where the prog res 0 fan act i vi ty de pen dson the progres of other activities. (In this paper the term activity is used synonymous to the term process.) Since cooperating sequential processes can be synchronized by means of mutual exclusive operations on a common data object frequently the literature deals with problem (S2), which is a key problem in data base systems ( Ko h 1 er, 1981). That i s why the 0 ve rview in the next paragraph mainly gives sol~tions of problem (S2).
SYNCHRONIZATION IN SINGLE COMPUTER SYSTEMS The implementation of basic constructs to sol ve prob1 em (S2) so called synchronization primitives is possible if the following three conditions hold:
(Cl) There exists an indivisible operation (atomic action) which is executed one at a time and can't be interrupted by other activities or external events. (C2) It is possible to remove the processor form an activity waiting for some event, e.g. for synchronization, to avoid any form of "busy waiting". Usually the corresponding functions are performed by an operati ng system. ( C3) I fan act i vi ty has en t ere d a regi on contro11 ed by a synchronization primitve it must leave this region through the regular exit, especially it doesn't terminate within this region. With condition (Cl) we can distinguish whether an atomic action is performed before an other or not. This means that we can define a sequence of atomic actions. It depends on the hardware and software architecture of the computer system which action can be considered as an atomic action. Fig. 1 gives an example for the implementation of a binary semaphore in a multiprocessor environment. The atomic action is an operation of the common memory taking into account that a sequence of operations in one of the processors is not an atomic action if the same sequence can be performed concurrently by an other processor. In the case of a single processor system generally condition (C2) is necessary activities. In a for concurrent multiprocessor environment this condition additionally maintains an economic implementation (Hoare, 1978).
Synchr onizati on of Concur rent Activi ties (0""0111 MEMOAY
SENA :
I I=== ==f- ----- " l
...I
PrOlt!5S0r 1 (lASt. A)
'rounor 11 (TAU 8)
5UIA :oo TRUE READ AND CLEAR SENA
f-- . _ . _ . _i . IF SENA COnTINUE USK A
-'-'-'-'READ AIIO CLEAR SENA;
[LSE WAll TASK It FOR
If SE"A CONU"Ut TASK 8
SEHA.
ELSE WAIT TASK B FOR SEMA
rIM . FIN L.....- Si:MA : .
TRUE
(b" •• al S£M,,·Reqw utsl
SEHA : . TRUE (Rene .... 1 S£MA-Reqwu tlo)
_
SEM,,-FALS[
-
TASK. Is luit i ng for synchronlulion
-
TIME READ AND CLEAR SE"A : READ
SEHA
CUU SE"A
173
The semap hore mecha nism has been shown in the examp1 e above (F i g. 1). It shou1 d be notice d that the programm er canno t check the value of a semap hore. There are only two opera tions for the entran ce and exit of the contr olled region (P- and V-ope ration by Dijks tra, 1968) . The semaphore must be prese t at a defi ni te value , e.g. by the opera ting system or the compi ler. The critic al region conce pt is equiv alent to the region contr olled by a semap hor, but the progra mmer has to dec1 are the region where an objec t is used exclu sively by an activ ity:
Fig. 1 Imple menta tion of binary semaphor in a multip rocess or environme nt The examp le shows that an implem entation of synch roniza tion primi tives can't garan tee that condi tion (C3) holds. It is the respo nsibil ity of the user to check that it holds . There fore more sophi sticat ed mecha ni sms have been estab1 i shed to check condi tion (C3) at compi le time. Fig. 2 shows the varios techni ques for synch ronisa tion (Levi, 1981) . Some remark s may descri be briefl y the rea1ti ons betwee n these techn iques:
wit~
objec t do statem ents done;
The corres pondi ng opera tions on semaphores can be produc ed autom atic1y by the compi 1 er. Semap hore and cri ti ca1 region s solve the proble m of mutua l exclu sion. In contr ast the condi tional critic al region solve the proble m of produ cer and consum er acti vi ti es. An activ ity produc es some objec ts where as an other activ ity consum es these objec ts concu rrentl y. It is clear that this proble m can be solved by semap hores or cri ti ca1 regi ons but the notati on in a progra m natura l in some way.
seems
un-
In the monit or conce pt an objec t with exclu sive acces s, togeth er with all I I I explicit control of synchronlu tion hl,he,. techniques (oncw,.,. nt .cthities opera tions on this objec t repres ented prl.Ulves for ,y"chr-ontr at.on by proce dures, form an abstra ct data for the proble. of for cOOpt,..ttn , prOUdur. .uhel f'XCluSlon :;:::::. type - the monit or. The synch roniza uquenttel l proc:usn orhnted tion is perfor med within the monit or in such a way that only one proced ure of the monit or can be execu ted at a time. This means that a monit or call can be cons; dered as an atomic action. The progra mmer is forced to Fig. 2 Survey of synch roni zati on write down the objec t decla ration s techni ques togeth er wi th the proced ures of the I
S1nchronil ltion hchn ; ques
H. Rzehak
174
monitor.
There
tions, e.g. ically
created
which
include
some
and
deleted
possi bi 1 ity
the
possible
monitors
by
semaphores.
sage
passing
longs
system
buffer
to
ceiver.
an
activity
Some other
transmitter can
the
rece i ver box.
by
called
it
message
identification
of
recei ver.
a
message
procede The
exept
the
a
for
box. To avoid
is
empty
message.
connection receiver
it
to
any
at
set
conditions.
of
free
these
must
message a
by
systems
Such
a
can
be
system
monitor
per-
which
con-
bOx". Genera1-
are
not
easy
to
handle in an environment where activ-
the
ities
an
exist
only
dynamically
over
a
certain periode of time. The synchronization tool tor concept and message
of
tems.
is
time.
this
direct
form
Fig.
3
passing
is
sys-
a so called
gives
an
exam-
is
the accept-statement -
of an
activity (task) can be entered exc1u-
and
lot
A
basic
ple. Like a monitor the entry point -
waiting
no
The
"rendezvous".
If the
transmitter
in Ada TM
has some properties of both the moni-
full. out
stops
There
between
a
systems
message
tains the object "mail 1y
box if there is any and pro-
box a
re-
will
is
message
cedes by interpretation of it. mail
be considered as a mail
implemented
After send-
mai 1 box
takes
in MASCOT can
transmitter as
transmitter
the
recei ver
the mail
the
the channel
deadlock
called
into
E.g.
form
a
contains
well
as of the
the
different
deadlock s
be-
send messages
putt i nq
The
of
which
activities
the mail
box)
of
A mes-
consists
(mail
passing 1itera-
ture.
imp1e-
to
massage
systems are published in the
slide1y
monitors
i s
message
ing
modifica-
It
deadlocks. ment
exist
nested monitors or dynam-
of
sive1y
and parameters can
Then
the
statements
be
passed.
withi n
the
flow o f control
I :
task body generiere_bot is
I
f1 ow of control
na chster_code: CHARACTER;
I
~begin
I
~ stateme nt s to produce
I
nac hster code
I
.. -
-
-
- -
-
-
-
-
-
-
-
-
-
-
-
-
f -~ ~ - _ - .~ __ ~ accept se nde zeichen (C:~ CHARACTER)~
-..,
end generiere_bot;
body dekodiere ~ code, zeichen: CHARACTER;
t.. begi n
dekodiere.sende_zeichen (n achster_code) ; - - - ,
~~;
~
:
: :
"rendezv ous"
I1
L - - - - -l - - -
I
nachstes zeic he n : CHARACTER; index:
I
INTEGER;
r - -
I ., -~
- -
index := 1 ;
: "rendezvous"
loop
~ dekodiere . empfange zeichen (nachstes_zei c hen) ; zeile (index) index
< 132
statements t o decode the val ue of code and to store
zei 1 e: STRI%{ 1 .. 132);
i.!.
code := C;
end;
: = nachstes_zeichen; ...
..J
r - - -~- - :
--- ---.J
I I ~accept
11
the result in zeichen empfange_zeichen {C:o ut CHARACTER)do C := zeichen;
~;
~~~;
end dekodiere;
~
index :z index + 1; else drucke (zeile); index := 1; end
.!..!:;
end loop; end drucke_bot;
Fig. 3 The rendezvous in Ada
175
Synchronization of Concurrent Activities
do end of the accept statement are executed by the called activity,
tion system i f it is a unique bus or
that
unusual because it leads to special forms of bus protocol s. Usually each
is
the
task
with
the
accept-
statement, if it has reached this statement. If the called activity has reached the accept-statement first, it must wait like an activity waiting for a message in a message passing system. It should be noticed only the call ing activity must
that know
the identifier of the called activity. In extension of the basic rendezvous a choi se of some accept-statements can be given in a select-statement. The rendezvous happens if the conditions
for
one
of
the
accept-
statements hold.
ring
system.
The
latter
is
quite
synchronization task is assigned to a certain computer of the network which performs
the
associated
atomic
ac-
tion. In
principle
each
participating
in
of a
the
activities
synchronization
task has to send a request to that subsystem which performs the associated atomic action and gets back an answer whether it can continue or not. This is very similar to the operation of message passing and it seems quite natural
systems to use
SYNCHRONIZATION IN LOCAL AREA
such systems for the synchronization of distributed activities. As stated
COMPUTER NETWORKS
in the last paragraph message passing systems are not easy to handle
in an
In a local area computer network some activities to be synchronized may be
environment where activities exist only dynamically and they must be
allocated to different computers of the network. I n the 1 i ght of the 1 ast paragraph we look how to use or modify the synchronization for this case.
structured carefully to prevent deadlocks. We can implement simpler con-
Firs t we have to check whether the conditions (Cl), (C2) and (C3) hold in a local area computer network. we have seen the implementation synchronization garantee
that
primitives condition
(C3)
As of
can't ho1 ds.
This is the responsibility of the user. We further ass ume that an operating system controls the network pr e fer ab 1y i n a de c en t r a 1 i zed form.
cepts
by
using a substitute activity
which represents the calling activity in the subsystem performi ng the atomic
action.
Messages
are
passed between the call ing and its substitute. The
only
activity message
oriented technique is hidden for the user. Fig. 4 gi ves an examp1 e for the implementation of a monitor. Semaphores can be implemented in the same way. The effect of synchronization is
To sat i sf y con d i t ion
( Cl) we h a v e no
delaied by the transmission time. In principle an anomaly of the time
common
the
ordering of events is
memory
as
in
examp1 e
of
possible
{see:
Fig. 1. An atomic action can only be performed either by a single computer
Lamport,
of
local network it can be negl ected in most of the applications.
the
network
or
by
the
communica-
the
1978al,
short
but with respect to
transmission
time
in
a
H. Rzehak
176
Subsys t em
forever"
Subsys t em j
1
way
a cti vi t~
I
r
,//
MaN!TOR
that
all
activities
the
- The
system
state
detects
which
may
each
lead
erroneous
to
a
"wait
I
I
forever"
state
~>
branch facility
to
Synchronization by a
handl i ng
part
distributed monitor
Enough
r 19 .
in
sound subsystems can run.
Q ( su bs ti tu t e )
'"
state and reacts in such a
by
and
an
the
user
exeption
an
handling
appropriate of
the
information
must
can error
programm. be
passed
to select an appropriate action. SYNCHRONIZATION S TATE S OF It is rest
quite of
a
AND
natural network
to
tolerance
is
not
the
implementation
computer
network.
activity region e.g.
very
reason
a
a
by
for
local
area
that
irregularly
the
the if
even if fault
Supposed
controlled
all
how
operate
of
terminates
because
then
The
look
can
single subsystem fails, the
ERRONEOUS
SUB S YSTEM S
a
an
in
a
semaphore,
processor
other activities
fails,
which
have
performed a request on the same semaphore
will
this,
we
for
all
wait can
forever.
install
activities
To
a
in a
region
controlled by a semaphore. By this is possible release
to
perform
operation
terminates easier
to
the
if
an
irregularly. prevent
a
but
cases
not
it
is
the
user
possible
activities
very
in many
to continue
unchanged
in
the
run-
ning part of the network whereas some activities subsystem
have with
terminated
in
the
breakdown.
The
re-
a
configuration of the system should be done
by
the
knowledge
user
because
which
whol e
system
means
that
he has more
functions
are the
of
the
essenti al.
second
Thi s
alternative
should be preferred.
it
missing
Synchronization attention ures.
to
techniques
don ' t
the occurrence
The only exeption
is
of
pay
fail-
Ada where
a time out condition can be given for
activity
an accept-statement by a delay-state-
is
ment.
It
"wait
seems
for
the
first
prevent
book-keeping
being
alternative
comfortable
much
forever"
This
activity
means
(which
that
the
contains
the
wait only
a
called accept-
situation using a distributed monitor
statement)
(see fig.
time and then continue performing the
occure
4). This situation can only
that tor
the
if
contains
the
da ta
subsystem monitor
transmi ss i on
kernel
fails
kernel
which
(suppsed
between
an.d monitor cover
calling
pos-
and
it
are
two
more
possible
generality
ways
to
there
handle
the
problem: - The
state
which
activities
not
is
are
clear what
that
unknown
happens
if
detects leads
each to
erroneous a
"wait
In
the
monitor
and
the
ren-
dezvous concept it is not possible to have the
system
is
disadvantage
the called activity terminates irregularly.
to
The
the
is
defin i te
statements associated with the delaystatement.
moni-
sible in any case). Returning
will
access
to
global
controll ed
objects
must
of the
call .
overhead
in
regi on.
be
passed
This many
may
object
within
A11
external
as
arguments
cause
applications
some of
177
Synchronization of Concurrent Activities
1 oca 1 area computer network s for process
control.
Therefore
we
propose
schedule before performing a requeststatement with
two different solutions:
a
on
a
semaphore.
message
passing
solution establishes a Synchronization transmitting tinued
done by a message In
any
activity
and
an
delivered about
is
s Ys t em.
pas sin 9
error
cas e
the
will
be
con-
code
will
containing
the
status
dency
on
the
higher
this
depen-
avai1labi1ity
of
the
subsystem containing the semaphore.
be CONCLUSION
information
of
the
Compared
system
receiver.
Checks for deadlocks and the reconfi-
Cons i deri ng
guration is the task of the user. For
synchronization
better convenience the user can simu-
puter network two solutions should be
1 ate
prefered
di stri buted semaphores or moni-
tors. ward
This
leads
to
implementation. the
tive
is not the
the
very
a straight-for-
states
a subsystem.
other solu-
READ
synchronization
AND
have
a
READ
level
the
had
user
passing to
sol ution concept
on
subsystem
and
if we
mai nta in
priate waitinq queues.
"waiting
A solution
any
to
operation
avoid
area
is
with
more
message
symmetric
of
with is
the
an
access
function
allows
to
extended
unsymmetric containing
to
The
semaphore
because the
the
semaphore
plays a special
the
cal experience the author is going to
appro-
We propose to
a
read
special
subsystems.
implement
role. To gain practi-
different
hierarchical local
solutions
semaphore the
value
are
expected
phore
with
concept
on
a
area computer net-
work. For such a system best which
com-
wou1 d have
enlarge the semaphore concept by - an
for
forever"
because it doesn't assigne a role
an
CLEAR
in a local
systems
as
to
operation
but
techn i ques
in the case of a breakdown of
It is quite unusual AND
1 anguage
primi-
semaphore itse1 f
CLEAR
atomic action.
to
The
tion is directed by the consideration that
di fferent
an
results
extended
because
the
sema-
hierarchy
without changing it; example:
reflects the unsymmetry of this solu-
value sema := sema
tion.
assignes
the value to
the
variable
value sema; REF ERENCES - an
access
function
to
the
queue
assoziated with a sema; example: name task_i assignes
the
position variable
Ada
:= sema.queue(i) name of
of
the
the queue
name_task_i;
reference
Department task to
on
manual. of
States
July
1980,
DoD MIL-STD-1815
the
Clark,
for i=O this
Reed
(1978).
local
area
is the name of the active task;
Uni ted
Defense,
D.D.,
K.T. "An
Pogran
and
introduction
networks",
D.P. to
Proc. of the
IEEE 66, pp. 1497-1517 With this access functions a monitoring
task
can
detect
a
"waiting
for-
Dijkstra,
E.W.
sequential
(1968).
"Cooperating
processes", in: Program-'
ever" state and perform the appropri-
ming Languages (Editor: F. Genuys),
ate actions.
pp.
The monitoring task
can
be activated with an appropriate time
43-112,
York 1968
Academic
Press,
New
H. Rzehak
178
Ethernet.
"The Ethernet,
network", DEC,
common
Intel
and Xerox,
C. A. R•
sequenti al
Q,
pp.
K0 h 1 er,
of
spezifikation
1980
( 1978 ).
"C 0 mm u n i cat i n g
processes",
Commun.
ACM
666-677 W. H.
techniques recovery
( 1 981 ) . for
in
pp.
"A
sur v ey
synchronization
a
puter system",
ll,
area
implementation
version 1.0, Sept. Ho are,
a local
decentralized
0
f
and com-
in Computing surveys
149-183;
(with an annotated
bibliography) Lamport, the
L.
(1978a).
ordering
of
tributed system" pp.
"Time clocks and events
in
a
dis-
in Commun. AOl
Q,
558-565
Lamport,
L.
(1978b).
"The implementa-
tion of reliable distributed multiprocess systems", Computer Networks ~,
pp.
Levi,
95-114
P.
(1981).
"Betriebssysteme fUr
Realzeitanwendungen",
Datakontext-
Verlag, Koln 1981 MASCOT
Supp 1 i ers
offi c i al puting
Association.
handbook
of
Standards
Signals
and
Molvern,
Worcestershire
Steusloff, mierung
H. von
Radar
MASCOT,
Section,
The ComRoyal
Establishment,
(1977). raumlich
"Zur
PrOClram-
verteilten,
dezentralen Prozessrechensystemen", Dissertation, Universitat Karlsruhe 1977