ELSEVIER
Information Processing Letters
Information Processing Letters 57 (1996) 279-285
A gap theorem for the anonymous torus Angelo
Monti
a, Alessandro
Roncato
’
by*
a Dipartimento di Scienze dell’ Informazione, Universitridi Roma “La Sapienza”, Via Salaria 113, 00198 Roma, Italy b Dipartimento di Informatica, Universiti di Piss, Corso Italia 40, 56125 &a, Italy Received 14 September 1994 Communicated by F. Dehne
Abstract We characterize the class of functions computable on an anonymous torus, where each processor does not know the dimension n of the torus but only an upper bound m of n. We show that any computable function can be computed exchanging O(n\l;;;) messages. Surprisingly, we prove a “gap” theorem showing that all non-constant computable functions have message complexity f&z&). From these results we obtain that to compute any non-constant computable function, the input collection algorithm is the optimal one. Analogous results are obtained for the case that the torus is non-square and for the anonymous ring. Keywords:
Distributed
computing; Message complexity; Torus of processors; Gap theorem
1. Introduction An extensive amount of studies has been devoted to computing on a ring of n asynchronous processors. [l] shows that any computable function can be computed with message complexity 0(n2), and that this bound is sharp for some functions. [4] proves that all algorithms computing non-constant computable functions have bit complexity fi(n log n), and some non-constant functions with bit complexity O(n log n> are given. Computation on a torus of 16 X fi asynchronous processors is investigated in [2], where
’ This paper has been supported in part by Progetto Finalizzato Sistemi Informatici e Calcolo Parallelo, CNR Italy. * Corresponding author.
it is shown that every computable function can be computed with message complexity O(n&>, and that this bound is optimal for many functions. For binary functions, the bound is O(&) bits. Moreover, it is shown that there exist non-constant functions having bit complexity O(n). The minimum worst case message (or bit) complexity for computing any non-constant computable function on a network is called distributed message (or bit) complexity of the network [4]. Hence we have that the distributed bit complexity is Oh log n) on the ring, and O(n) on the torus. That is, there is a “gap” between the size and the distributed bit complexity for the ring, while no such a gap exists for the torus. All the above results hold in the standard models, where the size n of the network is known to its anonymous processors. However, it is well known that the common knowledge assumed in
0020-0190/96/$12.00 0 1996 Elsevier Science B.V. All rights resewed SSDZOO20-0190(95)00197-2
280
A. Monti, A. Roncato /Information Processing Letters 57 (1996) 279-285
the model strongly influences the computational power. Since any variation of the size (due to faulty processors) has to be detected and communicated to all processors, it is useful to consider forms of common knowledge that do not change with time. In [3] is investigated the computational power of the ring network, when a weaker form of common knowledge is assumed. That is, the size n of the ring is unknown to the processors, but they know an upper bound m of it. Moreover, it is proven that any function computable in this model can be computed with message complexity O(nm). In this paper we study the computational complexity of the torus under the same assumption (m 2 II known). We give an exact characterization of the class of functions computable on this model, and prove that any function in such class can be computed with message complexity O(n&>. For binary functions, the bound is O(n&) bits. Unlike in the corresponding standard model, we obtain a “gap” theorem proving that any non-constant computable function has message complexity fl(n\l;;;). Analogous results are proven for the more general model of nonsquare torus. As a corollary we also prove that for the ring model introduced in [3] there is a “larger” gap than in the corresponding standard model. Hence in these non-standard models, the input collection algorithm used to obtain the upper bounds is the optimal one (the same is not true in the two standard models).
erwise. For each (II x l&string A and each pair of integers k, and k,, A ‘lok2 is the (It x /,)-string For each [a (r+kl)mod!,,(j+k2)modlz I O 0. potent iff it is invariant
Definition 3. Given an (I, x &)-string I, two integers x and y such that Odx
1
12 O
Note that the above definition does not agree with the more commonly used definition of tneighborhood; it defines a one-sided t-neighborhood, appropriate for unidirectional links that will be assumed in the computational model. Example 4. Take a
b e
A=d
2. Model and notation
then:
Let 1, and 1, be positive integers. An 1, x 1, bidimensional string A over an alphabet 2 (in short (1, x l,)-string) is an array of 1, x I, elements of 2, that is = [aijloGiGr oGj
/@o&f
$+*+) is the set of all the bidimensional strings. A bidimensional function is a function from X(+,+) to 2. For an (I, X &)-string B, A CDB is the (1, x (I,+ &)&string C where ci,j = ai,j for 0
I,,(y-t+j)mod
d
e
a
b
b defdef
c
e
f
d
e
b e
c f
a d
b e
b
c
a
b
c A(121 =
N0,1(A)3
c f
a
=
a
b
c
Our computational model is the torus, that is a network with N = nln2 processors pi,;, 0 G i < n,, 0
A. Monti, A. Roncuto /Information Processing Letters 57 (1996) 279-285
Pi,j to Processors
Pi,(j+ 1)modn2 and PO+1)modnI,j*
Each processor ignores the values of n1 and n2, but knows two upper bounds m, > n1 and m2 2 n2. It also ignores its absolute position within the network but can distinguish between its two output links, and between its two input links. The communication along the links may in general be asynchronous, but there is an ordering maintained, in that links act as FIFO queues. Our networks are anonymous that is the programs of all the processors are all identical and may depend on ml and m2. The input of each processor is a symbol of an arbitrary alphabet CT,then the global input configuration is a (n, x n&string over u. A function f is computable on the model when there is an algorithm such that each processor, starting the computation in a network with a global input configuration I, proceeds by sending messages to its neighbors, receiving messages from them and updating its state according, until it halts in a final state with f(Z) stored in a particular output register (in its local memory). BY n1 x n*C$rnz we denote our torus, that is an anonymous n, X n2 torus in which each processor knows the pair (m,, mJ Whenever sufficient, we simplify the notation to TM. A n1 x n2torus where ~zi=IZ~, it =n1rz2, and the processors know an integer m 2 n and that n, = n2 is called a square torus and denoted by nTz.
3. Characterization TM
of computable
functions
in
Given an algorithm AL, a synchronized erecuexecution of AL where all the processors in the network start at time zero, internal computation at any processor takes no time, and messages take exactly one time unit (step> to traverse the one link. By a straightforward induction on t it is easy to prove the following lemma: fion of AL is the particular
Lemma 5. Given an algorithm AL for TM and two bidimensional strings Z and I’, ifq,j(Z>, = Nt,,j,(Z’), for some t > 0, then processor pi,j with global input configuration Z and processor pil,jl with global input
281
configuration I’ are in the same states in the first t steps of the synchronized executions of AL. Theorem
6. Each function computable in TM is
idempotent . Proof. Let AL be an algorithm computing f in TM, Z be an (1, x Z.&string, and t be the number of steps required by AL to compute f(Z(“l, k2)) for some k,,k, 2 1 in the synchronized execution of AL. It is easy to see that
&,I)( Z>t = No,,( zCkl,k2)) t 7 and that
N,,,(z),
= Ns, mod,,,s2 ,,,o,,~2W10SZ)t for
each s1,s2 2 0. Hence, by Lemma 5, the processor P,,,~ on I, the processor p,,,e on Zckl*kz), and the terminate in processor Ps, mod il,sz mod I, on Zslos2 the same state (and then with the same output value). Thus the function f is such that f(Z) = f(ZCkl*kz)) =f(ZSloSz) for each input instance Z and integers kl,k2,s1,sz > 0, that is f is an idempotent function. 0 An easy generalization of the technique used in [3] to compute idempotent functions in nonstandard rings, shows that a sufficient condition for processor pi,j, 0 < i < nl, 0 have been recorded. The second phase is similar to the first, but column-wise, instead of row-wise. Now each processor starts by sending a message consisting of the m input values collected in the first phase to its down neighbor. It continues by receiving messages from its up neighbor, recording them, and forwarding them to its down neighbor until m messages
282
A. Mont& A. Roncato / Information
(including its own> have been recorded. Thus we obtain: Fact 7. Every idempotent function f can be computed in nI x n2T~~xmz with message complexity OhIn,
madm,,
Processing Letters 57 (1996) 279-285
Case 1: Assume f(o) #f(O). The final state of ~2~~,/21~l+i lm2/~21[2-l on a has to be different from the final state of pO,Oon 0. It is easy to see that
mJ). =N
4. The gap theorem In this section, we prove that any algorithm computing any non-constant function in nT/ has message complexity !Xn4G) for infinitely many instances. First we prove an analogous result for the non-square torus. Theorem 8. For each non-constant function f computable in n, x n2Tmyxrnz there is a pair of integers (k 1, k2) depending on f such that the number of messages required by any algorithm to compute f is: fi(nln2 min{n,, n2})
if k, does not divide n, and k, does not divide n 2 ; if k, does not divide n1
fqw2n2)
and k, divides It2; if k, divides n, and k,
qn,(n2)2)
does not divide n 2 ; R(n,n,
max{m,, m2))
if k, divides n, and k, divides n 2.
Proof. Since the function value is independent of the delay times of the asynchronous computation, in this proof we always assume synchronized executions. Let AL be an algorithm for f and o be an (1, x I,)-string such that f(O) #f(o), where 0 E C, and w must exists since f is non-constant function. Let rni > 21, and m2 2 21, and, without loss of generality, assume m,/Z, > m,/l,. Consider the (21m,/21,]1, x lm,/l,ll,)-string
2[m1,211,1*-1.,mz,1*,[*-
1Wl~*/W**
By Lemma 5, at step s d [m,/2Z,lf, the processor ~~~~~~~~~~~~~~~~~~~~~~~~~~ on a has exactly the same behavior as the only processor on the symbol 0 in 1 X lTm”fXm,.Hence the computation of the processor pO,Oon 0 requires [m,/21,11, steps before the processor terminates giving as output f(O). Thus the computation on input 0 requires lm,/21,]1, messages. Noting that all the processors in the execution on O(‘Q,“2) in n, X n2Tmyxm have the same behavior as the only processor in’ the execution on 0, the computation of f(O(Q, “2’) requires n,n,[m,/21, ]I, messages, and the claim follows taking k, and k, equal to 1. Case 2: Assume f(o) = f(O). By the assumption follows f(a) #f(o). Hence the final state of ~~~~~~~~~~~~~~~~~~~~~~~~~ on
(Y is different
from
the
final state of p,, _ 1,t,_ 1 on o. It is easy to see that 4,- I,[,- 1(+ww4 =N
[m1,211,1*-
l,[m*,1~,l*-
I( 4lwwh*
By Lemma 5 at step s G lm,/21,)1, processor ~~~~~~~~~~~~~~~~~~~~~~~~~ on a has exactly the same behavior as the processor pt,_ I+ 1 on o in 1, x Hence the computation of the proces~2c$?ly sor p,,_l,l,_l on o requires [m,/2Z,lZ, steps before the processor terminates giving as output f(w). Since at each step at least one message has to travel the torus, the computation on w requires [m1/2Z,11, messages. Consider now the behavior on the string
We have to distinguish two cases, namely f(a) #
Subcase 1: Assume that 1, divides n, and I, divides n2. The string /3 is equal to ~~“~/~1~“2/‘2), and it is easy to see that for 0
f(O) and f(a) -f(O).
0 gj < n2/12,
(y = ,(Lm1/2Umz/M) *
O(l~l/~~lJ~l,l~z/~21~2)~
we have
~1-1,12-1(~)Lm,,21111,=
283
A. Monti, A. Roncato /Information Processing Letters 57 (1996) 279-285
By Lemma 5, at step l(P)[m~/21,]~~* [M1/21,]Z, processor on P in ni X has the same behavior as the processor G$?12 pi_l,j_l on w in I, x 12T[xm2. Thus for each of these groups there are lm,/21,11, messages. Hence p has message complexity (n,/l,> * h,/l,>
Nil,-l,jlz-
E)ir,-l,jr,_l
l ._l(P>l,, = N~l_l,,,-l(w)lli foreach 1 Q i < ln:/lifand 1 Q j < n,/l,. Hence a lower bound on the number of messages exchanged on p is:
N, i-l
* lm,/W,. Subcase 2: Assume that I, does not divide n, and I, does not divide n2. Suppose first that Ln,/f,l d Ln,/l,l. It is easy to see that ( P)min(lli, Ij) = N I,-1,1,- 1(~)mi,I114 &jl for each 1 G i < ln,/l,] and 1
4,i-l,Ij-1
ln,/41 ln,/l,l C C minIi, j) i=l
j=l 1n*/r,1 1n,/r,1
=
,Fl
C j-1
midi,
A
+
1nAJ
lndhl
C i=Ln,/I,J+l
C
i
j=l
Subcase 4: Assume that I, divides n1 and 1, does not divide n2. Analogous to the subcase 3,
with the role of the column and of the row interchanged. As for 3, we obtain fi((n,/l,> (n2/Z,>2> messages exchanged on /3. Letting k, = 1, and k, = I,, the theorem folq lows for each of the above subcases. For the case that the torus is square, one can obtain the following result with similar arguments as for the non-square case, taking into account that (Yis now a square string. Theorem 9. For each non-constant function f computable in an nTmMthere is an integer k depending on f such that any algorithm computing f requires n(n&) messages if k divides n and fl(n6) otherwise.
Now we show that there exist non-constant bidimensional functions whose message complexity matches the lower bound of Theorem 8. Theorem 10. For each integer k there exists a non-constant function f computable in n, x n2Tm$m 2 with bit complexity: 0(nIn2
min{n,,
n2})
if k does not divide n, and n,; if k does not divide n,
0((nd2n2)
and k divides n,;
In a similar way we can prove that the computation on p requires ~((n2/Z2Xn,/l,>2) messages when ln,/l,l d ln2/f21.
0( n,(n,)‘)
Subcase 3: Assume that I, does not divide n, and 1, divides n2. It is easy to see that
O(n,n,
if k divides n, and k does not divide n 2 ;
max{m,, m2})
if k divides n, and n2.
284
A. Mod, A. Roncato/Znformation Processing Letters 57 (1996) 279-285
Proof. Let f be the characteristic set s = (( @X*)(Y~‘Y2) 10 ~~13x2
function of the
Y,,Y,
> O}
A = [ai,rloGi,j min{m,, m2) that generates a rejectmessage, we have: Phase 1: Each processor sends its input symbol to the processors with relative addresses 6, j) with 0 Q i,j Q k. In this way a processor pizj receives a copy of Ni,j(Z)k where Z is the mput configuration of the torus. If Ni,j(Z), does not contain any symbol equal to 0 then the processor pi,j rejects. The processors with input value equal to 1 become passive, and in the following they act as routers until they receive a reject message or an accept message. The processors with input value equal to 0 check that the only symbols equal to 0 in their k-neighborhood are in positions (0, 0), (k + 1, k + l), (k + 1, 0) and (0, k + 1). If this is not the case, then they reject. Clearly, phase 1 uses O(n,n,) messages with O(1) bits each. Note that, at the end of this phase, if the input string does not belong to S then there is at least one rejecting processor and one reject-message has been sent in the row and in the column of that processor. Moreover if k does not divide n1 (respectively n,) then the distance of an active processor from a rejecting one is at most ni (respectively n2). Phase 2: Each active processor checks that in its column and in its row there are not rejecting processors. To this aim it sends a bit along the column and a bit along the row, and waits until a bit is received along the column and a bit is received along the row. These send and receive where
operations are performed max (m,, m2} times or less if a reject-message is received. To analyze the protocol we consider the following four cases: Case 1: k does not divide n1 and n2. By the considerations made at the end of Phase 1, after O(min{n,, n2)) bits a reject message is received by each processor. Then the total number of exchanged bits is O(n,n, min(n,, n2}). Case 2: k does not divide ItI and k divides n2. After OhI> bits a reject message is received by each processor. Hence the total number of exchanged bits is O(n,n,n,). Case 3: k divides n, and k does not divide n2.
After O(n2) bits a reject message is received by each processor. Hence the total number of exchanged bits is Obz,n,n,). Case 4: k divides n1 and n2. The bit complexity is O(n,n, max{m,, m2}) and, if Phase 1 ended with some rejecting processor, then at the end of Phase 2 there are some rejecting columns, that is columns having only rejecting processors. Phase 3: Note that this phase may occur only if k divides n1 and n2. Each active processor checks that there are not rejecting columns in the torus. To this aim it sends a bit along the row, [m,/kl - 1 bits are received of which the first [m,/k] - 2 bits are forwarded. If no reject-message is received then the active processor decides to accept. The number of bits exchanged in this phase is O(n,n,m,). 0 The above results can be generalized to multidimensional grids with boundary connections, i.e. to the case of d-dimensional n1 x n2 x . . * x nd grids with boundary connections (with d constant) where each processor in the network knows an upper bound m, on the value ni, 1 Q i 6 d. In particular one can show that every function on a d-dimensional n x * . . X n grid with boundary connections where each processor knows a bound m on n can be computed with message complexity 0(&m). Also one can prove that all non-constant computable functions on this model have message complexity R(nd + ‘). For d = 1, the following result can be obtained following the proves of Theorems 8 and 10 with minor changes.
A. Monti, A. Roncato /Information Processing Letters 57 (1996) 279-285
Theorem 11. Let R be a ring of II processors where each processor knows an upper bound m of n. For each non-constant function f computable in R there is an integer k depending on f such that any algorithm computing f requires CI(nm> messages if k divides n, and n(n*> messages otherwise. Moreover, for each k there exists a non-constant function fk computable in R with bit complexity O(nm> if k divides n and bit complexity O(n’> otherwise.
We remark that this gap theorem for rings does not have to be confused with the results of Moran and Warmuth in [4]. Their results refer to the standard model where each processor knows the size of the ring. Moreover their gap theorem refers to the bit complexity on non-constant functions. Their lower bound does not hold if one
285
counts messages (of arbitrary length) instead of bits.
References
[ll H. Attiya, M. Snir and M.K. Warmuth, Computing on an anonymous ring, J. ACM 35 (1988) 845-875.
Dl P. Beame and H. Bodlander, Distributed computing on transitive networks: The torus, in: STACS 89, Lecture Notes in Computer Sciences 349 (Springer, Berlin) 294303. [31 P. Ferragina, A. Monti and A. Roncato, Trade-off between computational power and common knowledge in anonymous rings, in: Proc. Coll. on Structural Information and Computational Complexity (1994). t41 S. Moran and M. Warmuth, Gap theorems for distributed computations, SLAMJ. Comput. 22 (1993) 379-394.