Microprccessing and Microprogramming 35 (1992) 463-468 North-Holland
463
IMPROVING THE PPOBABILISTIC C L O C K SYNCHRONIZATION ALGORITHM . Gianluigi Alari CSR4, Via Nazario Sauro 10, 09123 Cagliari, ITALY , E.Mail:
[email protected] Augusto Ciuffaletti Dipartimento di lnformatica dell'Univer sit~ degli Studi di Pisa, Corso Italia n. 40, 56100 Pisa, ITALY, E.Mail:
[email protected]
We introduce a clock synchronization algorithm, based on the one presented by Cristian in [ll. Our method significantly improves the performance of the original algorithm, by using a more accurate remote clock reading rule. We obtain an algorithm which keeps the clocks synchronized (within a given precision) with less messages per time unit. The protocol is validated and its effectiveness is evaluated by simulation. Kevwords: Clock synchronization, Distributed system, Probabilistic algorithm.
1.
INTRODUCTION
The notion of time is strictly related to the order in which events happen, if this concept is easily applicable to centralized systems, where events are totally ordered by the scheduling policy, it is much less immediate in a distributed system. A distributed system consists of a collection of processes, located in several independent nodes; the cooperation between the nodes is implemented by the messages that flow from node to node.
In such systems it may be difficult to establish the timing of events. Nevertheless, the knowledge of this relation may be required by a specific application: e.g., the synchronization of joint actions or the measurement of the time elapsing between two actions happening at dif+ ferent nodes. If each node owns a local clock, which is kept synchronized with all the others, it is possible to implement such operations.
Thus, we may say that clock synchronization is useful to achieve two kinds of results: (i) define an ordering of events; (ii) synchronize actions on different nodes. According with [11 and [21, we distinguish Internal and External synchronization. Internal synchronization keeps the difference between any two clocks in the system within a certain limit; extemal synchronization keeps the difference between any clock and the real time (represented by an external time ceference) within a certain limit. internal synchronization is adequate if the computation is independent from real time, but we cannot state that l0 seconds elapsed between two given events et and e2 if they happened on diflbrent nodes. To obtain such information we need external synchronization. This paper presents an algorithm to achieve cxtemal synchronization based on I 1].
464
2. D E F I N I T I O N
G. Alari, A. C&ffoletti
OF THE
PROBLEM
A logical clock is a data structure which keeps, a ' each node, an approximation of the real time, given by an external timing source. We will denote the value of the logical clock of node q at real time t with Cq(t). Logical clocks tend to fall apart from the external time source: each time unit, the value o f the logical clock is incremented by a quantity slightly higher or lower than one. The difference between this quantity and one is called drift. Therefore 'the logical clock must occasionally be corrected, to make up for its drift. To perform this correction, a node musl tcad the external time reference.
The goal o f the synchronization protocol is to preserve the following relation at each node q: I Cq(t)- t I _< where 6 is a suitable constant representing the maximum allowed error of the logical clock. 2.1 O v e r v i e w of the protocol Each server q periodically sends a message to one of the masters asking for the current va!ue of the external time source. The answer contains the value of the logicaJ clock of tile master when the message was received. Since the master I'ta:.; direct access to the time reference, we assume that Cmaster(Ir) = tr.
Obtaining that value is a serious problem since the external time source is usually a peripheral connected to a few rlodes of the distributed systent. We call the nodes which have direct access to the tin're reference masters, and the other servers. The reading of tile external tinle source wili be affected by the eommunicatioli delays. This amounts In a negligible quantity for the masters, but not for Ihe servers. We introduce a solution to this problem based on the probabilistic clock synchronization scheme presented by F. Cristian in [ 1 I. This protocol is said to be probabilistic because instead of reaching synchronization through a detemdnistic protocol 131, it tries to re:ch it with high probability using a non-deten,finistic protocol. For the sake of conciseness, we have limited our discussion to the clock reading algorithm: we refer to 111 as to the use of the clock value to update the local logical clock aml for other related issues. Let us formalize some basic concepts, The logical clock deviates from the real time at a variable speed, limited by the upper bound p: (tl-t2)( | "P) < Cq(ll )'Ctl(12) -< (h-t:}(I+p) where the two times tl and t2 lay between two successive updates of the logical clock at node i. The upper bound p is the drifting rate: its typical value is a ~ u t 10"seconds. We assume that the clock update has a negligible duration, with respect to the time usually elapsing among two successive updates.
When the answer is received, the server q has three pieces of iaform:.ltiOll: - the local approximation of the lime tel when the request was sent (Cqltil)): - the local approximation of the lime h wilen the answer is received (Ctl(l I }); - the real time tr when ils request has been received by the muster.
The server may also exploit some further system hypotheses concerning the time elapsing from tire delivery of the request to the receipt of the answer. This time has u lower bound rain, deduced from the communication protocol used to route and process the messages, and exhibits some statistic regularity. Let us call P(r_>t)the probability that a time greater tbao t elapses from the delivery of lhe request in the receipt of the answer. Using the above data it is possible to derive two fundamental pieces of infomlation: (i) an approximatitm of lhe value of h, and (it) an upper bound of the approximation erro r. The formula that enable the derivation of these iuformations is called clock reading rule. Based on the data obtained through the application o f the d o c k reading rule, the server can decide if the approximation of the clock value is sufficiently precise. In this case we say that the server reaches contact with the master. Using the
465
Improving the probabi/istic clock synchronization algorithm available information it is possible to quantify the probability of reaching comact: let us call Pro the probability that a server reaches contact with the master with :: single message exchange.
If I T , - to I < 80-< 8 then
tr+(min+13min) -< tl <- tr+(min+Bmax) (I) When the server reaches contact with the master, it knows an upper bound of the distance between its logical clock and me real time. Using that knowledge and th= ugper bound of the drift p. it can plan when to make tbe next attempt to reach contact with the master, taking into account that several attempts may l~ail before reachiug contact again. We call DNA (delay that the server will with a master, to be attempt a new clock
to next atteutpt) the timcout set when reaching contact woken tip when it is time to reading.
When the DNA has elapsed the server will start trying to reach contact with the master. At most K attempts are allowed, each one separated from the successive by a time W. After K unsuccessful attempts, the server is considered to be oat of order, since its clock may have drifted more than 8 time units from real time. Therefore, the probability that the overall protocol fails to maintain the desired degree of synchronization is K Pfail = Pro
I[ 1
where Bmi,,=MAX[ 2D(I -p)-2min-ct,,~a~.01 I]max=2D( 1+p)-2min-ami, and ami,=MAX[ (tr-(T,+&)))-min.01 Otm~x=(tr-(T.-&3)-min. n The corresponding evaluation reported in [ I I is: tr + min _< h < tr + 2D(I+p) - min (2) which is a less accurate estimate. We give an immediate view of the results by comparing the simulation of the two rules. In the following figure, we report the readiug errors, i.e. half o f the intervals in (I) and (2), in a sample o r S 0 0 clock reading operations. Each point represents a clock reading: the y coordinate is the error obtained using (1), the x coordinate is the error obtained using (2). It is straightforward to observe that the error of (1) is always lower than tile error of (2). (ms°c)
2,2.
O u r proposal
W e propose a slight modification of the clock reading rule presented in I1 I, and we leave unchanged the rest of the protocol. Due to this improvement we gain a valuable reductiof~ il't the cost of the algorithm. The rule we propose is summarized below. Theorem: In the above notation, let us call D half the length of the time interval It,).h I, as measured by the server: D=Cserver(tl)-Cserver(t0) 2
0"61 0.5
/.%)°%
o, t 0.a/
J k ~ ~.. : $ : %
40.I~
~" °e
*"
° t 0.5
°o
."
• .
-
'
" i 1.0
r 1.5 imr.ee)
Errors in determining h in 500 clock feeding simulations. Each dot represents a clock reading: the y coordinate is the error obtained using (l); the x coordinate is the ¢m)r obtained using (2).
and Cserver(to) = T0
The simulation is related to the same situation reported as typical in I I I. The actual drift rate is assumed to be 5-10 -6, with p = 6-10 "6, and the
466
G. Alari, A. Ciuffoletti
distribution o f the delays between the request and the receipt of the answer is derived from a X2(3) distribution, modified in order to be close to the sample used in It ] We used a sample of 1000 values to simulate 500 clock readings.
[2]
W e obtained an average reading error of 0.20 msec, while in the simulation of rule (2) we obtain 0.39 m ~ c .
131 F . B . Schneider ",4 Paradigm for Reliable Clock Syncronization." Technical Report T R 86-735, Dept. o f Comp, Science, Cornell [Jniversity, Ithaca, N.Y.
Following the protocol in 11] we draw some concluding results. Since the clock reading is more precise, and therefore the estimate of the error is lower, we will reach contact more frequently: we obtain a Fro = 0.656, while in the simulation of rule (2) we obtain 0.434.
H. Kopetz, W. O c h s e n r e i t e r " C l o c k Synchronization in Distributed Real Time Systems." IEEE Transactions on Computers, Vol. 36, No. 8, (August 1987), pp. 933-940
APPENDIX Proof of t h e t h e o r e m . Let Cserver(h~)=Tc~
This means that. each time a server decides to try to synchronize with the master, it will make fewer unsuccessful attempts before reaching contact. Therefore fewer messages will be exchanged to reach the same result. The simulation shows that, to keep a ~5=1 msec with a probability of losing synchronization Pfail<10 -9, we need on average 2.49 messages per minute, while in Cristian's case we need 3,66 m e s s a g e s per minute.
rain + a = tr-t, and rain + [~= h-tr (ct,[:l>0) and 2d = h-t~. We can infer that (1) (2)
2d=2mi:i~c,,+;A and h=to+2d
Since (3)
3. C O N C L U S I O N S W e have presented an improvement in the probabilistic clock synchronization algorithm proposed by F. Cristian in [ 1 ], which typically obtains a reduction of about 30% of the cost of the algorithm, expressed as messages per minute.
[ Cserver(t0)-to I < ~
we can infer an approximation of min+ot, from the definition o f c t and p: (tr-(T0+~50)) -< rain + ~ <_(tr-(%~-~50))
(4)
and therefore (5) czt~,a,=MAXl(tr-(To+~o))-min,OI O.ma~={tr-(To+80))-min
We propose a different expression to determine the approxim~,tion of the real time which is used to synchronize server's clucks; this reflects in a lower approximation error and consequently reduces the number o f messages needed to maintain a certain degree of synchronization. The protocol has been tested in cases other than the typical one, showing a cost reduction varying from 21% to 45f(~ with respect to Cristian's REFERENCES
Ill
Since the value of cc is now hound, we can derive a lower and upper bound for I:l using ( I): (6) MAXl2d-2min-txm~×,Ol<_l~<_2d-2min-ctm., Since (7) 2D(I-p)<_2d_<2D( 1+p) we obtain (8)
I~min=MAXl2D(I-p)-2min-C~max,OI Bmax=2D( l+p)-2min-cZmin.
Since F. Cristian "Probabilistic Clock S~'nchronization.'" Distributed Computing, ~qo. 3, 1989, pp. 146-158.
(9) tl=tr+(min+fi)
Improving the probabi/~tic crock synchronizRtlon algn~thm
from (8) we cbtain: (10)
tr+(min+Bmin)(I -p)
-< t l < tr+(min+Bma~)( 1+p )
467
Since the clock of the server drifts away from real time with a rate p, we may infer that. at the previous attempt, which happened W time unit:; before, we had:
Q.E.D
I to-Cserver(to) I < 8 - (pW)
A b o u t the value of 8,.
In general, at the (K-i)°th attempt:
The above proof considers that the m a x i m u m deviation of the clock of the server from real time when it sends the request to the master is ~5o, and that 8o<8.
Therefore, we can give the following definition for 8o at the j-th attempt, given the value of K and W:
] 10-Cserver(to) ] <- ~ - (ipW)
In fact, we know that the distance will still be lower than 8 at the K-th attempt (which is the goal of the synchronization algorithm): I t~-Cserver(to) I <_ 8 at the K-th attempt
8o = 8 - ((K-j)pW)