Analysis and optimum performance of two message-passing parallel processors synchronized by rollback

Analysis and optimum performance of two message-passing parallel processors synchronized by rollback

111 Analysis and Optimum Performance of Two Message-Passing Parallel Processors Synchronized by Rollback Debasis Mitra A T&T Bell Laboratories, 600 M...

771KB Sizes 0 Downloads 10 Views

111

Analysis and Optimum Performance of Two Message-Passing Parallel Processors Synchronized by Rollback Debasis Mitra A T&T Bell Laboratories, 600 Mountain Aoenue, Murray Hill, NJ 07974, U.S.A.

Isi Mitrani Computing Laboratory, University of Newcastle upon Tyne, Newcastle NE1 7RU, United Kingdom Received 1 May 1985 Revised 17 March 1986 We analyze a probabilistic model of two processors sharing a common task, such as distributed simulation, proceeding at possibly different rates and with each allowed its own clock and 'virtual time'. When the task requires it, the processors communicate by messages which are stamped with the local time at the point of origination. This feature is also used to synchronize the processors by 'rollback', in which technique a processor receiving a message in its past is required to set its clock and state back as necessary to guarantee correctness. The probabilistic model incorporates for each processor: (i) a distribution for the amount by which the local clock is incremented at each state transition, and (ii) a probability indicating the likelihood of generating a message at each state transition. For the case of exponential distributions we obtain by the method of Wiener-Hopf factorization explicit formulas for the steady-state distribution of the difference in virtual times, the average rate of progress and the average amount of rollback per state transition. We introduce a performance measure which reflects both the average rate of progress and the cost of implementing rollback. This performance measure is optimized with respect to the system parameters and an explicit solution is obtained. This result provides remarkably explicit insights into optimal operations.

Keywords: Distributed Simulation, Parallel Processing, Virtual Time, Wiener-Hopf Factorization. Debasis Mitra was born in Calcutta, India in 1944. He received his B.Sc. and Ph.D. degrees from the University of London in 1964 and 1967, respectively. He has been with AT&T Bell Laboratories since 1968. He is at present Head of the Mathematics of Networks and Systems Department at Murray Hill. His research interests are in queueing theory, parallel algorithms and processors, and communication systems. Dr. Mitra is a member of ACM, IEEE, and IFIP Working Group 7.3.

lsi Mitrani was born in Plovdiv, Bulgaria, in 1943. He received his Ph.D. degree from the University of

Newcastle upon Tyne in 1973, having previously studied at the University of Sofia and the Technion in Haifa. At present, he is a Reader in Computing Science at the University of Newcastle upon Tyne. His interests include queueing theory, modeling of computer and communication systems, ancient and medieval history and Contract Bridge. Dr. Mitrani is a member of IFIP Working Group 7.3.

North-Holland Performance Evaluation 7 (1987) 111-124 0166-5316/87/$3.50 © 1987, Elsevier Science Publishers B.V. (North-Holland)

112

D. Mitra, L Mitrani / Two message-passing parallel processors

1. Introduction

One way of bringing additional processing power to bear on a large task, is to split the latter into semi-autonomous sub-tasks which can then be executed in parallel on several processors. This usually entails some effort in synchronizing the execution of the sub-tasks: there may be intermediate results produced by one sub-task which affect the progress of another. The 'brute force' approach to synchronization consists of ensuring that all sub-tasks proceed in locked step, so that if we imagine a local clock measuring the progress of each of them, the values of all local clocks are always equal. This, however, may well be inefficient, especially if the processors have different speeds or if the workloads are not equal. Not only is the overall execution rate limited by that of the slowest processor, but there are considerable overheads incurred in administering the locked-step operation. We shall study a different synchronization method, based on message passing and rollback (see [3]). The sub-tasks, or processes, as we shall call them from now on, are allowed to proceed at different rates and to have different values on their local clocks. Intermediate results are passed from process to process by means of messages, stamped with the local time of the sender. The receiver of a message compares its time stamp with his own local time: if the message is 'in the future,' then no immediate action needs to be taken (apart from storing it); if, however, the message is 'in the past,' then it presumably invalidates all subsequent work done by the receiver and the latter has to be rolled back to the time indicated on the stamp. Such a mechanism presupposes, of course, the ability to roll back a process to any point in its past. It is assumed that state information is saved continuously, or at least sufficiently frequently, to make this possible. Each rollback involves a restoration activity, the cost of which is proportional to its amount, i.e., to the difference between the local time of the receiving process and the time of the received message. There could be many applications of a process distribution and synchronization of this nature. One that has received some attention recently is concerned with distributed simulation [1,4,7]. The simulation of a large system is partitioned into a number of sub-system simulations, which are then carried out in parallel on different processors. When an event occurs in one subsystem that has an effect on another, a message is sent to the corresponding process (or processes). Depending on the relative values of the local clocks of sender and receiver, that message may or may not cause a rollback of the latter. The main question that arises in this connection concerns the effect which the synchronization mechanism has on the rate of progress for the entire task. A related issue is the cost of rollbacks. It may be worthwhile to artificially slow down a very fast processor in order to reduce that cost. A model involving two processes was examined in [6] and an approximate solution was obtained under some special assumptions. We also consider the case of two processes, but generalize the assumptions and obtain exact results. The model and its general solution are presented in Section 2 and in Appendix A. Closed form expressions in the exponential (but asymmetric) case are obtained in Section 3. The optimization of the system performance is discussed in Section 4. A performance measure is introduced which reflects both the average rate of progress and the cost of implementing rollback. This measure is optimized with respect to the system parameters and an explicit solution is obtained. The implications on optimal operations are discussed. Section 5 examines some further possible generalizations.

2. The model and its solution

Two communicating processes are being executed on two processors, perhaps with different speeds. The system is observed at a discrete sequence of time instants, numbered 1, 2, .... These can be thought of either as the 'ticks' of a global clock which advances by constant increments of one time unit, or as the times of occurrences of certain events. At time i, the state of the system is defined as the pair (X,, Y~) where Xi and Y~ are the values of the local clocks associated with process 1 and process 2, respectively. These values are measures of the amounts of work done on the two processes up to time i. At every time instant the two processes may send each other messages. Process 1 does so with probability a (0 < a ~< 1) and process 2 with probability fl (0 < fl ~< 1). These events are independent of the

D. Mitra, L Mitrani / Two message-passing parallel processors

113

system state history and are instantaneous. If at time i process 1 receives a message from process 2 and if Xi > Y~, then process 1 is rolled back and its clock is reset to Y~. Whether that happens or not, Xi is then advanced by a random amount ~, which is independent of the system state history. Similarly, process 2 is advanced at time i by a random amount 7, perhaps after having been reset to X,. It is worth noting that the case of only two processes benefits from a simplification that does not exist in cases with more processes. The simplifying feature is that a rollback does not by itself cause further rollbacks. It is easy to see that with more than two processes a cascading of rollbacks may be caused by a single rollback. Denoting the event "process j sends a message to process k " by j --+ k, and the indicator function of the event B by I B ( I B is 1 if B occurs and 0 otherwise), the evolution of the system state can be described by the following equations:

Xi+l=Xi[(1-I2~l)+I2__.llX,~y,]+ YiI2_+llx,>y+~,

i=1,2 .....

(1)

I,~2Ir<.x, ] + XiIl+2Iy,>x+,,

i = 1,2 .....

(2)

Yi+l = Y / [ ( 1 - I 1 ~ 2 ) +

The above assumptions ensure that {(X, Yi), i = 1, 2 . . . . } is a (continuous state) Markov chain. Unfortunately, the transient distribution of that Markov chain is very difficult to determine, and it has no equilibrium (both X, and Y, grow without bound as i --+ o¢). However, we are only interested in the rates of progress of the two processes, i.e., in the ratios E[ X,.]/i and E[Y~]/i. It turns out that those quantities do have limits when i ~ ~ , as do the expectations E[X,+I - Xi] and E[Y/+ 1 - ~] (all those limits are, of course, equal). Moreover, to determine them, it is sufficient to study the differences

Zi=Xi- Y~,

i=1,2 .....

From equations (1) and (2) it follows that { Z,, i = 1, 2 . . . . } is also a Markov chain. Indeed, using the identity

[(1-I2-,)+I2-+llx,~r,]=l-I2~llx,>r,, equation (1) can be rewritten as

X,+I=X~-(X~-Y,)I2~dx>v+~

i=1,2 .....

(3)

i=1,2,....

(4)

Similarly, (2) can be rewritten as

Y,+,=Y~+(X,-Y,)Ix+2Ir,>x+ ~,

Now, subtracting (4) from (3), we obtain an equation describing the evolution of Z~:

Zi+l=Zi-Zi[Ii_+2Izio]+~l,

i=1,2

.....

(5)

where ), denotes the random variable ~ - 7/To see that { Z~ } has an equilibrium distribution, it suffices to note that the conditional expectation of Zi+l, given that Z i = z, is equal (since E[I1~2] = a and E [ I : ~1] = fl), to

E[Zi+llZi=z]={~-az+E[7] flz+E['y]

ifz<0, if z > 0 .

(6)

Hence, whenever Z, grows very large in absolute value, there is a tendency for Z,+ 1 to be pulled back towards the origin. This implies the existence of an equilibrium [6]. Let Z be the limiting (in distribution) random variable, as i ~ m. It satisfies

Z - Z - Z[Ii+2Iz< o + I2~1Iz>o]

+ ")',

(7)

where " - " means 'equal in distribution'. The assertions we made concerning the rates of progress of the two processes are simple consequences of equations (3), (4) and (7).

114 Proposition

D. Mitra, L Mitrani / Two message-passing parallel processors

2.1

lim E( X,+, - X,) = lim

i---~o0

i'-") oO

g(Y/+

1-

Yi) - - D ,

where D = [ - a f i E ( Z ) + aE(~) - BE(~I)]/(a - fi) =

I z I) +

+

+ B).

Proof. Taking expectations in (3) and (4), and letting i ~ oo, gives

~ma E( X~+, - X,.) = - flE[ZIz> o] + E(}),

(8)

i--~ oO

and lim E(Yi+ 1 - Y~) = aE[ZIz
(9)

i--~ oo

On the other hand, taking expectations in (7) yields

aE[ ZIzo] = E ( y )

(10)

which implies that the fight-hand sides of (8) and (9) are equal. To show that D has the value claimed, note that

E[ Zlz>o] + E[ Zlz
(II)

E[ ZIz> 01 - E[ ZIz
(12)

The desired expressions for D are obtained by solving either (10) and (11), or (10) and (12) for E[Zlz>o] and E[ZIz
lim [E(X~)/i] = lim [E(Yi)/i] =D. i'-'* oO

(13)

i--'~ oO

Our task now is to determine the probability distribution function, F(x), of the random variable Z. Let G(x) be the probability distribution function of ~, and U(x) be the unit step function:

U(x) =

{1

ifx>~0, if x < 0.

Conditioning upon the outcomes of the message-passing events, (7) can be rewritten as

(Z+y Z - ) ZIz<°+y [ZIz>o+y

if I1._, 2 = I2_, 1 = 0 , ifll-'2=0and12-'1=1'

ifll_.2=landl2_,l=O, if 11_. 2 = 12_. 1 = 1.

This, together with

P(ZIz<°<~x)=

(1

F(x)

ifx>~0, ifx<0,

p(Ziz> < x ) = { o ( X o

)

if x > 0 , i f x ~<0,

allows us to write

F(x) =

(1 -

a)(1 - fi)F(x) * a ( x ) + (1 - a)fi{ U(x) + F ( x ) [ 1 - V(x)] } * a ( x )

+ a ( 1 - fi)[ U(x)F(x)] * G(x) + afiG(x),

(14)

where " *" denotes convolution. Replacing, in the first term in' the right-hand side of (14), F(x) by U(x)F(x) + [1 - U(x)]F(x) and canceling terms, that equation becomes

F(x) = ( I - fi)[U(x)F(x)] * G(x) + ( I - a){[1 - U(x)] F ( x ) } * G(x) + fiG(x)

D. Mitra, L Mitrani / Two message-passingparallelprocessors

115

or, in integral form,

F(x) = (1 - fi)fx_ooF(x-y ) d G ( y ) + (1 - a)fx°°F(x - y ) d G ( y ) + fiG(x).

(15)

A change of variables x - y ---,y would reduce (15) to a Fredholm integral equation on the infinite interval

(-oo, ~),

I,:(x, y)F(y)

F(x) =

dy + fiG(x),

K(x, y) defined by (1-a)G'(x-y) if y < 0 , K(x, y ) = (1 f i ) G ' ( x - y ) if y > / 0 (assuming that the derivative G'(x) exists). However, as the general theory of Fredholm equations does with a kernel

not help much in this case, we shall keep the form (15) and obtain a solution by applying the method of Wiener-Hopf factorization [2]. We introduce the Fourier-Stieltjes transforms

~(s) =

e'*XdF(x), ~

00

~(s) =

f_o ei~XdG(x),

(16)

00

and define

ep+(s)= f°°ei'XdF(x), "0

o-(s)=f_'

eiSXdF(x).

(17)

Then, after a little algebra, equation (15) is transformed into O(s) = (1 - fi)~b(s)[F(0) + O+(s)] + (1 - a ) f ( s ) [ - F ( 0 )

+ q~-(s)] + tiff(s).

(18)

Bearing in mind that q~(s) = q~+(s) + q~-(s), (18) can be rewritten as

ep+(s) = a(s)ep-(s) + b(s),

(19)

where

a(s) = - [ 1 - (1 - a)q~(s)]/[1 - (1 - fi)q~(s)] and

b(s) = { aF(O) + 1311 - F(0)] } ~k(s)/[1 - (1 - f i ) + ( s ) ] . Now, although O(s) and ~b(s) are defined only when s is real, a glance at (17) convinces us that O+(s) is defined and analytic for all complex s such that Im(s) >/0. Similarly, O-(s) is defined and analytic for all complex s such that Ira(s) ~< 0. Moreover, both these functions are bounded in modulus and tend to zero when Is[ ~ oo. The problem has thus been reduced to the following: Find two functions, q,+(s) and q,-(s), bounded and analytic in the upper and in the lower half-planes respectively, vanishing at infinity, continuous on the real line from above and from below respectively, and satisfying on that line the relation (19). This is sometimes referred to as the Riemann boundary problem on the real line (e.g., see [2]). The number of solutions to the problem is related to the so-called 'index' of the function a(s), which is defined as the variation of the argument of a(s), as s sweeps the real line from - oo to oo. In our case, it is not difficult to show that the index is 0 and therefore the problem has a unique solution (see Appendix A). The factorization method is so called because its first step is to find a representation for a(s) of the form

a(s) = a+(s)/a-(s),

(20)

where a+(s) and a-(s) are analytic in the upper and lower half-plane respectively and are bounded in modulus both from above and away from zero. Equation (19) then becomes

ep+(s)/a+(s) = q~-(s)/a-(s) + b(s)/a+(s).

(21)

116

D. Mitra, L Mitrani / Two message-passingparallelprocessors

The next step is to find a representation for the function b(s)/a+(s) in the form

b(s)/a+(s) = b+(s) - b-(s),

(22)

where b+(s) and b-(s) are bounded and analytic in the upper and lower half-planes respectively, and vanish at infinity. Equation (21) could then be written as

q~+(s)/a+(s) - b+(s) = ep-(s)/a-(s) - b-(s).

(23)

Finally, the following argument yields the solution: The right-hand side and the left-hand side of (23) are bounded and analytic functions in the upper and lower half-planes respectively. The equation itself, which is valid on the real line, shows that the two sides are analytic continuations of each other and therefore together define a function which is bounded and analytic on the whole plane. But by Liouville's theorem, that function must be a constant. Moreover, since both sides of (23) vanish at infinity, the constant must be zero. Hence, the solution is given by

q~+(s)=a+(s)b+(s);

ep-(s)=a-(s)b-(s).

(24)

There is still one unknown to be determined: the constant F(0) which appears in b(s). That constant is obtained from the normalizing condition q~+(0) + q~- (0) = 1.

(25)

General formulae for calculating the factorization (20) and the decomposition (22) are given in Appendix A. However, although those formulae can be used for numerical solutions, they do not provide ready insight into the behavior of the model. On the other hand, when the random variables f and ~ appearing in (1) and (2), are distributed exponentially, rather simple closed form expressions can be obtained. This is done in the next section.

3. Exponentially distributed increments Suppose that the increments ~ and ~/ are distributed exponentially, with means 1/# and 1/~ respectively. The Fourier transform of G(x) is equal to ~p(s) = g u / [ ( g - is)(p + is)], and the expressions for a(s) and b(s), given in (19), are

a(s)

=

--[S 2 q- is(g-- V) + O~]X/~]//[S2 q- is(g-- P) + figS'],

(26) (27)

b(s) = Agu/[s 2 + i s ( g - u) + flgr], where A = aF(O) + fl[1 - F(0)]. To achieve the factorization of a(s), we write it in the form

a(s) = - [ ( s -

Sal)(S-- S a 2 ) ] / / [ ( S

--

SB1)(S-- $82)] ,

where s~l = ½ i [ - ( g - v ) +

~ ( t * - v ) 2+ 4a~v]

and s~2=½i[-(#-p)-~/(#-p)2+4ag~

,]

are the two roots of the numerator in the right-hand side of (26), and

S,81~li[--(~-- V)-{-~/(~-- P)2"~4j~gb']

(28)

117

D. Mitra, L Mitrani / Two message.passing parallel processors

and @2 = ½i[- (~t- v) - ~(/~- v)2 + 4/3/~v] are the two roots of the denominator. Note that s~l and sB1 are in the upper half-plane, while s~2 and @2 are in the lower half-plane. Therefore,

a+(s) = (s - s~2)/(s - sB2)

(29)

is analytic in the upper half-plane and is bounded in modulus there both from above and away from zero. Similarly,

a-(s) = -(s-

s ~ , ) / ( s - Sal )

(30)

is analytic and bounded in the lower half-plane. The two functions (29) and (30) thus satisfy the conditions for the factorization (20). Next, b(s)/a+(s) had to be decomposed into b+(s) - b-(s). We write

a+(s)

(s-s,2)/(s-s/32)

=

(s-s,2)(s-s~)

= - s-s,z

s-sa~'

(31)

where

B = A t u , / ( s , 2 - @ , ) = 2A/tvi/[¢(/x-u)2 + 4a/tv + ~(/z-v)2 + 4fl/~v ]. Clearly, the first term in the right-hand side of (31) is analytic and bounded in the upper half-plane, and vanishes at infinity, while the second term is similarly behaved in the lower half-plane. The desired decomposition is therefore provided by (31). Thus the solution, according to (24), is given by B

do+(s) =a+(s)b+(s) - - - ,

dO-(s) = a - ( s ) b - ( s )

S -- Sfl2

B

--,

S -- Sal

or, after multiplying numerator and denominator by - i , C dO+(s) - va - i s '

C dO-(s) = i------~ v, + '

(32)

where

and

The unknown constant appearing in C is determined from the normalizing equation (25), which yields

C = v,#J(v,, + v•),

(33)

or A = v,,v~/(ltv). We see from (32) that the probability density function, f(x), of the random variable Z, is exponential with parameter v~ to the left of the origin, and exponential with parameter vt~ to the right of it:

f(x)

v.va (eV.X[1 _ U(x)] + e-v~XU(x)), v,+vt~

(34)

D. Mitra, L Mitrani / Two message-passing parallel processors

118

U(x) being again the unit step function. The mean of Z is given by

(35)

E ( Z ) = -iq,'(0) = - i [ q , + ' ( 0 ) + ~ - ' ( 0 ) ] = (v. - v~)/(v.vB). Similarly, the mean of the modulus of Z is obtained as ~ - ' ( 0 ) ] = ( vJ + v~ )/[ v.vt~(v~ + v~)].

E(IZt) = -i[¢'(0)-

(36)

4. Performance measures and optimization

4.1. Performance measures The first quantity of interest is, of course, the rate of progress of the two processes. This can be obtained from Proposition 2.1, together with either (35) or (36):

a-fl

(37)

vvB +--t*

or

D_

1

[ _ a fl

vJ+v~

+o~+~].

(38)

It will be convenient, in what follows, to work with the average process increments, rather than their reciprocals. Denote these averages by m I and m2: m I = 1//~, m 2 = 1/v. The expressions for v, and va, given after (32), can be rewritten as

1

va - 2mira 2

[(ml_m2)+a],

1

vl~- 2m,mz [ - ( m l - m2) + b],

where

(39)

b= [(ml-

• ,~

+

l'm,m

] 1/2

1



Substituting (39) into (37) and multiplying numerator and denominator in the first term in the square brackets by a + b we obtain

D= a -1 t~ [

-a-b~ - - ~ - ~

~m--~---~

+ am1 - tim2 ] "

(40)

Another performance measure of interest is the average amount of rollback per step of the global clock. Denote that quantity by R. Since a rollback occurs either when process 1 is ahead and process 2 sends it a message, or when process 2 is ahead and process 1 sends a message, R is given by

R = E[ZIz>oI: -1 - ZIz o] - aE[ZIz
mlmzvJ-(ml-m2)v,-a=O, equation (41) can be rewritten as (m, - m2)(flv . - ave) + 2aft R =

fly,, + avB

mam2v~+(ml-m2)vB-fl=O,

(41)

D. Mitra, L Mitrani / Two message-passingparallelprocessors

119

or, after substitution of (39),

R = (ct + f l ) ( m 1 - m2) 2 - (m 1 - m 2 ) ( a b - fla) + 4 a f l m l m 2 (ab + fla) - ( a - fl)(m~ - m2)

(42)

On the other hand, it is easy to verify the identity

( ab + fla )( ab - / 3 a ) = ( a + fl )( m 1 - m2)2 + 4a/3m,m2" a - fl

(43)

This leads to a cancellation in (42), yielding the simple expression (a and b are defined in (39))

[ R = ( ab - /3 a ) / ( a - fl ) ,

a --/=fl . ]

(44)

When a = fl, applying l'Hospital's rule to the above yields I R = [(m 1 -- m2) 2 + 2 f l m l m 2 ] / b ,

ot = ft.

(44a)

We can now establish two useful properties of the performance measures D and R. Proposition 4.1. The total average progress for the two processes per step of the global clock is equal to the total average increment minus the average rollback: 2D

= m 1 + m 2 --

R.

(45)

Proof. It suffices to add equations (8) and (9) and to take note of the definitions of D and R.

[]

Proposition 4.2. For fixed m 1 and m2, the average progress and the average rollback are, respectively, monotone decreasing and monotone increasing with respect to ~ and ft. Proof. In view of (45), it is sufficient to prove the monotonicity of R. Taking derivatives in (44) with respect to a and/3, we observe that they are both positive:

OR

/3 (a-b)~2 > 0 ,

OR

ct ( a - b )

2 >0.

~et

2a (ct- fl)2

aft

2b ( a - fl)2

[]

Proposition 4.1 may be exploited to obtain an expression simpler than (40) for D, the rate of progress of the two processes. From (44) and (45),

D=[½(ml+m2)-(ctb-fla)/)2(a-fl) ½(m I + m 2 )

[(ml-- m 2

,

ct=~fl,

+ 2/3mlm2]/2b ,

or=ft.

An immediate corollary of Proposition 4.2 is that the achievable rates of progress lie in the interval

m l m 2 / ( m 1 + m2) ~< D < rnin(ml, m2). The equality on the left is obtained by setting a = fl = 1 in (38) (note that in that case v~ = 1 / m 2 , v/~ = l / m 1 ) . The upper bound on the right is approached when either a ~ 0, or/3 ~ 0, or both (remember that D is not defined when a = 0 or/3 = 0). Again this follows by letting, say, a --* 0 in (38): if m 1 ~< m 2 then v~ --+ O, (a/v~) --+ m 2 - m a (by l'Hospital's rule) and D --+ ml; if m 1 > m 2 then v~ --+ (m 1 - m 2 ) /(mira2) and D ~ m 2. The lower bound on D is the mean of the minimum of two independent, exponentially distributed random variables with means m 1 and m 2. The excess of D over the lower bound represents the improvement in the rate of progress of the rollback technique over locked-step operations.

D. Mitra,L Mitrani/ Twomessage-passingparallelprocessors

120

4.2. Performance optimization Suppose now that we have some freedom in setting the parameters a, /3, m I and m2, and wish to optimize the performance of the system. For example, there might be some merit in increasing artificially a (or/3) by sending extra 'dummy' messages whose only purpose is to prevent the other process from getting too far ahead. Similarly, it might be advantageous to slow down one of the processors by making it do less work per unit of real time. Since we seek to maximize the rate of progress of the two processes and to minimize the work involved in rollbacks, it is reasonable to take as an objective function

J=D-cR,

c>0.

(46)

Here the coefficient c reflects the relative cost of implementing rollbacks. We thus have the following optimization problem:

max J(a, /3, ml, m 2 ) (~, #, m;, m2) subject to the constraints

ot>~et*, fl>~fl*,

ml~Ma,

m 2 ~ M 2.

The lower bounds a* and fl* arise from the inherent communication structure of the two processes: a certain number of messages have to be sent. The upper bounds M 1 and M E are related to the available processor speeds. Two of the variables can be eliminated immediately. Proposition 4.2 implies that J reaches its maximum on the boundary a = a*, fl =/3 * :

max

(a, /8, ml, m=)

J(a, fl, m,, m 2 ) =

max J(et*, /3", m,, m2).

(ml, m=)

In other words, nothing can be gained, in terms of maximizing J, by sending more messages than are required for performing the task. From now on, we shall assume that a = a* and 13 = fl*. The asterisks on these parameters will be implied. Proposition 4.1 allows us to write the objective function in terms of R, m 1, and m 2 only. The problem becomes

max { J ( m l , m 2 ) = - ( c + ½ ) R (ml, m2)

+½(ml + m 2 ) }

(47)

subject to m 1 <<.M1, m z ~ O, m 2 > O, 0 < a <~1, 0 < fl <~1. Proof. From the definition of a and b, after (39), it follows that

a=[ax, +(1-a)x2]

1/2,

b = [ f l x , + ( 1 - f l ) x 2 ] '/2,

where x 1 = (m 1 + m2) 2 and x 2 = (m I - m2) 2. Therefore, (44) can be rewritten in terms of x 1 and x 2 as R=

ot ot-fl[/3Xl + ( 1 - f l ) x 2 ] 1/2

a~fl[etXl + (1-°t)x2] 1/2

The proof is completed by noting that ~R _ 1 [a(1-fl) Ox2 2( ot - / 3 ) b

fl(1 a ) ] ~ - > 0. ,,

D. Mitra, I. Mitrani / Two message-passing parallel processors

121

The last inequality holds because when a > fl, a(1 - 13) > fl(1 - a) and b < a, whereas when a < 13, a(1 - fl) < 13(1 - a) and b > a. When a = fl, l'Hospital's rule yields OR

1

ax 2

2a + mlrn2

a ( 1 - a)

-a3

> O.

[]

This proposition implies that, in the absence of constraints, the maximum of J is reached on the line m 1 = m 2. On that line J(m, m) is given by

J(m,m)=v~+vl~

+~--1

-c

and is an increasing, null of decreasing function of m, depending on whether c < c*, c = c* or c > c*, where

¢*= ½(1/(d + l / f B - 1 ) . In particular, when c >/c*, the global maximum of J(m 1, m2) is at m 1 = m 2 = 0. We have arrived at the rather startling conclusion that, if the relative cost of rollbacks is sufficiently high, the best policy is not to do the task at all! The threshold value c* decreases when a a n d / o r fl increase, reaching c* = 1 when a = fl = 1. Suppose now that c < c* and, without loss of generality, that M~ >/M 2 (i.e., the first processor is at least as fast as the second). The feasible region of (m 1, m2) points is illustrated in Fig. 1. From Proposition 4.3 we know that, over the square Oabc, the maximum of J(rn], m2) is attained at point b. Also, on any line ml + m2 = const, the maximum of J(ml, m2) is attained at the feasible point which is closest to the line m 1 = m 2 (e.g. point f in the figure). Therefore, the maximum of J(m 1, m2) over the rectangle bode is attained somewhere along the segment be. More formally, max

m 1 <~M] m2<~M 2

J(m~, m 2 ) =

max

M2<~m 1 <~M]

a(m,, M2).

Thus, the slower processor should always be operated at its maximum speed. The only question that remains now is whether the faster processor should be slowed down, and if so, by how much. Ignoring for the moment the constraints on m 1, taking the derivative of J(m~, M2) (defined in (47)) with respect to m I and equating to zero, gives the equation (1 + 2c)

aR(m,, M2) am 1

1,

or

1

1

a(r-

1 + 213)

_

fl(r- 1 + 2a)

}

o B,

ml-m 2

m2

\

"<

/

m t +m2= const.

x f

M2

//

/'~45

/

//

//

e

¢ " \ "" x

°

xx

45"~ '~" . ¢

I d

MI

Fig. 1.

ml

(48)

122

D. Mitra, L Mitrani / Two message-passingparallelprocessors

where r = m l / M 2. It is a pleasing feature of this problem that only the ratio of the optimal processor speeds matters. When r = 1, the right-hand side of (48) is equal to Vrh-fl/(vrd + ~/~) = 1/(1 + 2c*) < 1/(1 + 2c) (since we have assumed that c < c * ) . When r ~ o ¢ , the right-hand side of (48) tends to 1 > 1 / ( 1 + 2 c ) . Therefore, equation (48) has at least one root, r*, on the interval (1, ~ ] (possibly at infinity, if c = 0). Moreover, that root is unique, because O2R/Om~ > 0 for all m I > 0 (we Omit the proof of this last statement). The above arguments hold for a :~ ft. When a = r , equation (48) becomes (either by using l'Hospital's rule or by taking the derivative in (44a)), 1

(r - 1) 3 + 6flr(r - 1) + 4flZr

1 + 2c

[ ( r - 1) 2 + 4flr] 3/2

,

a

=

ft.

(48a)

Again it can be shown that (48a) has a unique root, r*, on the interval (1, m]. In both cases, r* determines the optimal value of ma. We see that max

M2 <~ml <~M 1

J(ml, M2)=J(m~,

M2),

wherem~'=min(M1, r'M2).

Thus, if r * < M1/M2, then it is worth slowing down the faster processor and using m I = r *M2; otherwise the value ml = M 1 should be used. To summarize, the policy that optimizes the objective function J(a, r, ml, m2) must (i) Send as few messages as possible, in both directions. If that does not lead to c* > c, then it is best not to undertake the task by the method of rollbacks. (ii) Let the increments of the slower processor be as large as possible. (iii) Set the increments of the faster processor so that, in the mean, its ratio to that of the slower one is equal to r*, or is as close to it as possible. It is perhaps worth pointing out that the coefficient c, which is included in the objective function to reflect the relative cost of implementing rollbacks, affects the optimal solution in two ways. First, it determines whether the optimal solution is the trivial one of not doing the task. Second, assuming a nontrivial solution, the ideal ratio of processing rates, r*, decreases monotonically when c increases (thus making the slowing down of the faster processor more advantageous).

5. Generalization of the model The methods described in the previous sections can be applied to the analysis of other models involving two processes. One generalization thai can be handled quite easily consists of allowing different distributions of advance steps depending on whether there has been a rollback or not. Instead of the random variables ~ and 77, representing the advance steps made by the two processes, we can assume that the latter are governed by four random variables, ~a, 42, */1 and *12- Process 1 advances by ~1 if it has not had to roll back, and by 42 if it has. Similarly, process 2 advances by */1 normally and by */2 after a rollback. The evolution of the system state (X~, Yi) can now be described by two equations similar to (3) and (4): Xi+l

= Si -

( Si -

Y/)I2

~llx,

> Yi ~1- ( 1 --

Yi+, = Yi + ( Xi - Y/)I,-, 2It. > x, + (1

-

12 ~ l l x , >

y i ) ~ 1 -]-

12~lIx, > yi42 ,

ll._.,2IYi> X,)*/1 "1-I1_+2Ir, > x.*/2.

As before, these lead to an equation for the differences Z, = Xi - Y~: Zi+ l = Zi - Zi[ l l - , 2Iz,o] +41

--

*/1 + 12--,l/Z, > 0(42 -- tl ) + 11--, 2/Z, <

0(71

- */2).

D. Mitra, L Mitrani / Two message-passingparallelprocessors

123

Again it can be demonstrated that, as i---, oo, the sequence { Z i ) converges in distribution to a random variable Z. The limiting distribution function, F(x), satisfies the equation

F(x) = ( I -fl)[U(x)F(x)] * Gll(X)

=[- (1

+/3Gn(x) + 1311 - F(0)] [G21(x)

-

- a){[1 - U(x)]F(x)} * Gll(X) G n ( x ) ] + aF(O)[G12(x) - G n ( x ) ] ,

where U(x) is the unit step function. Gij(x ) is the distribution function of ~i - ~/j (i, j = I, 2) and * denotes convolution. Transformed, that equation reduces to a boundary problem very similar to (19):

,+(s) = g ( s ) , - ( s ) + h(s), where ¢+(s) and ~)-(s) have the same meaning as in (19), and

h(s)=/311-F(O)]t~21(s)+otF(O)~12(s)

1 - (1 - a ) q q l ( S ) g(s)=

-- 1 _

(1 _ / 3 ) ~ b 1 1 ( s

) ,

1 - (1-/3)fin(s)

'

~pij(s) being the Fourier-Stieltjes transform of G,j(x). This boundary problem, too, can be solved by Wiener-Hopf factorization. In particular, if ~1, ~2, 71 and ))2 are exponentially distributed with parameters ~q, /*2, vl and v2 respectively, the factorization of g(s) into g+(s)/g-(s) is achieved by (29) and (30), simply replacing/, and v by ~t1 and v I respectively. The decomposition of h(s)/g+(s) into h+(s) - h-(s) is slightly more difficult. A rational function with denominator (s-sm)(s-s~z)(s+ i/,2)(s + iv2) has to be decomposed into four elementary fractions, necessitating the solution of four linear equations for the coefficients. Having done that, expressions for the performance measures can be obtained in a straightforward way. This generalization destroys the 'nice' properties of D and R that were established in Section 4. There is no simple analogue of the identity (45) and D is no longer a decreasing function of a and /3 for all values of the parameters (for example, if the average increments after a rollback are much larger than the normal ones, the sending of more messages can be advantageous).

Acknowledgment The fact that Proposition 4.1 can easily be shown to hold in the general case was pointed out to us by John Clowes, of Newcastle University.

Appendix A Here we give formulae for the general solution of the boundary problem (19). These are obtained by direct application of existing results (see [7]). The factorization of a (s) is accomplished by solving the homogeneous boundary problem

a+(s) =a(s)a-(s).

(A.1)

The index of this problem (i.e., the variation of the argument of a(s), in multiples of 2st, as s sweeps the real line from - oo to oo), is zero: I n d [ a ( s ) ] = Ind[1 - (1 - a)~k(s)] - Ind[1 - ( 1 - f l ) + ( s ) ] = 0, since the real parts of both bracketed terms are always positive. Hence, the problem has a unique solution, up to a constant multiplier. Taking logarithms in (A.1) reduces that boundary condition to ln[a+(s)] -In[a-(s)]

= ln[a(s)].

(A.2)

124

D. Mitra, 1. Mitrani / Two message-passing parallel processors

The functions satisfying (A.2) have the following limiting values on the real line, from above and from below, respectively: 1

oo ln[a(u)]

ln[a+(s)] =½1n[a(s)] +-5-~f~oo u - s du,

(A.3)

1 ln[a(u)] In[a-(s)] = -½1n[a(s)] + 2 - ~ . _ ~ u - s du.

(A.4)

The singular integral in the right-hand sides of (A.3) and (A.4) is defined in the sense of its principal value. Next, we have another boundary problem of type (A.2):

b+(s) - b-(s) = [b(s)/a+(s)].

(A.5)

The solution of (A.5), on the real line, is given by

a b(s) + 1

b+(s)

2 a+(s)

1 b(s)

b-(s) =

2

b(U)_s)dU,

2-~vi,_~a+(u)(u 1

oo

(A.6)

b(u)

a+(s~+2-gi~if~=a+(u)(u-s)du'

(A.7)

again taking the principal value of the integral. One can avoid having to deal with singularities and principal values by observing that Principal value

[j: au] --

~g--S

= O.

Therefore, f~

ln[a(u)] ___ du= f? U--S

l n [ a ( u )-]---l n [ a ( s ) ] du, oo

U --S

and b(u) ~ 1 [ b(u) a+(u)(u-s) du= u~s a+(u)

b(s) ]

a+(s) du,

where the integrals in the right-hand sides are nonsingular. Finally, ~+(s) and ¢-(s) are obtained from (A.3), (A.4), (A.6) and (A.7) according to

q~+(s) =a+(s)b+(s),

d~-(s)=a-(s)b-(s).

References [1] K.M. Chandy and J. Misra, Asynchronous distributed simulation via a sequence of parallel computations, Comm. A C M 24 (11) (1981) 198-206. [2] F.D. Gakhov, Boundary Value Problems (Pergamon Press, New York, 1966). [3] D. Jefferson, Virtual time, A C M Trans. Programm. Language Syst. 7 (3) (1985) 404-425. [4] D. Jefferson and H. Sowizxal, Fast Concurrent Simulation Using the Time Warp Mechanism, Part 1: Local Control,

Tech. Rept., Rand Corp., December 1982. [5] H. Kushner, Introduction to Stochastic Control (Holt, Rinehart & Winston, New York, 1971).

[6] S.S. Lavenberg, R.R. Muntz and B. Samadi, Performance analysis of a rollback method for distributed simulation, in: A.K. Agrawala and S.K. Tripathi, eds., Performance '83 (North-Holland, Amsterdam/New York, 1983) 117-132. [7] B.D. Lubachevsky and K.G. Ramakrishnan, Parallel timedriven simulation of a network on a shared memory MIMD computer, Proc. Internat. Conf. on Modelling Techniques and Tools for Performance Analysis, INRIA, Paris, May 1984. [8] B. Noble, Methods Based on the Wiener-Hopf Technique (Pergamon Press, New York, 1958).