JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING ARTICLE NO.
34, 82–94 (1996)
0047
On the Power of Segmenting and Fusing Buses1 JERRY L. TRAHAN,*,2 RAMACHANDRAN VAIDYANATHAN,*,2,3
AND
RATNAPURI K. THIRUCHELVAN†
*Department of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, Louisiana 70803-5901; and †Advanced Paradigms Inc., Alexandria, Virginia 22314
by processors form buses connecting processors. The above models differ due to the underlying graph and restrictions placed on how a processor can partition its ports. Other reconfigurable bus-based models include the bus automation [17], reconfigurable array processor (RAP) [8] and reconfigurable buses with shift switching (REBSIS) [14]. Recently, the authors have proposed the Reconfigurable Multiple Bus Machine (RMBM) [25–27]. Sahni has proposed an extension to the RMBM called the distribution memory bus computer (DMBC) [21, 22]. In this paper, we address some issues related to the power of reconfigurable bus-based models. Previously, Moshell and Rothstein [17] have characterized the set of languages (called ‘‘immediate languages’’) that a bus automation can accept in constant time. Ben-Asher et al. [3, 4] have related RN complexity classes to Parallel Random Access Machine (PRAM) and Turing machine complexity classes. Further, a two-dimensional R-Mesh has been shown to be equal in power to multidimensional R-Meshes [32], and to be more powerful than a PRAM [20, 24, 35]. We identify two fundamental abilities of a reconfigurable bus-based model: (i) its ability to ‘‘segment’’ a bus into two or more bus segments, and (ii) its ability to ‘‘fuse’’ two or more buses or bus segments together. Segmenting and fusing are natural operations on multiple buses. They also capture distinct operations in the context of reconfigurable bus-based models. Intuitively, segmenting separates adjacent portions of a bus, thereby creating more buses. On the other hand, fusing joins two or more buses that are not necessarily proximate, thereby reducing the number of (disjoint) buses. These operations provide ways of manipulating the pattern of communication, the central feature of reconfigurable bus-based computation. Besides the ability of a model to segment and fuse buses, we also consider concurrent reading and writing capabilities of the model. Previous results have assumed some fixed capability of the model (with respect to concurrent reads from and concurrent writes on buses). For instance, Ben-Asher et al. [3, 4] and Wang and Chen [35] assumed a Collision CRCW model, whereas Olariu et al. [20] assumed a CREW model. Based on the abilities to segment and/or fuse buses, and concurrent reading/writing capabilities, we derive a hierarchy of relative ‘‘powers’’ of the PRAM and recon-
Reconfigurable bus-based models of parallel computation have been shown to be extremely powerful, capable of solving several problems in constant time that require nonconstant time on conventional models such as the PRAM. The primary source of the power of reconfigurable bus-based models is their ability to dynamically alter the connections between processors by manipulating the communication medium. This can be viewed as the models’ ability to (i) segment a bus into two or more bus segments and (ii) fuse two or more buses or bus segments together. In this paper, we investigate the contribution of the abilities of a reconfigurable bus-based model to segment and fuse buses. We show that the ability to fuse buses is the more crucial of the two. The ability to segment buses enhances the power of the model under certain circumstances. We also study the roles of concurrent reading and writing in the context of reconfigurable bus-based models. These results establish a hierarchy of powers of the PRAM and reconfigurable bus-based models. 1996 Academic Press, Inc.
1. INTRODUCTION
Recently, models of parallel computation based on reconfigurable buses have drawn considerable interest. A key feature of these models is their ability to manipulate the communication medium (buses) to suit computational needs. Several models based on reconfigurable buses have been proposed in the literature, such as the reconfigurable mesh (R-Mesh) [12, 16], reconfigurable network (RN) [4, 24], polymorphic processor array (PPA) [11, 15], and the processor array with reconfigurable bus system (PARBS) [20, 33]. A common trait of all these models is an underlying connected graph whose edges provide ‘‘external connections’’ between processors (vertices). A processor has a port associated with each edge incident on it. Each processor can partition its ports, internally connecting all ports in any given block of this partition. The external edges together with the pattern of connections locally established 1 A preliminary version of some of the results in this paper was presented at the International Parallel Processing Symposium, 1993. 2 Supported in part by the Louisiana Board of Regents through the Louisiana Education Quality Support Fund under contract number LEQSF (1994-96)-RD-A-07. 3 E-mail: htrahan,
[email protected].
82 0743-7315/96 $18.00 Copyright 1996 by Academic Press, Inc. All rights of reproduction in any form reserved.
SEGMENTING AND FUSING BUSES
figurable bus-based models. The main results of this paper are the following:
• The ability of a model to fuse buses is more crucial to its power than the ability to segment buses. In fact, the ability to segment buses does not add to the power of a model that can fuse buses. • If a model has the ability to segment and/or fuse buses, then concurrent writing capability adds no more power than concurrent reading capability. That is, for truly reconfigurable bus-based models that can segment and/or fuse buses, concurrent writing capability is not necessary. • The RN (and hence all ‘‘graph-based’’ reconfigurable models) is only as powerful as an RMBM that is permitted to only fuse buses. • The ability to segment buses is not entirely without use. In fact, we show that concurrent writing capability is both necessary and sufficient for a PRAM to simulate (in constant time) the ability of an RMBM to segment buses. We also relate the powers of an RMBM with only segmenting ability and an R-Mesh in which all buses are either horizontal or vertical lines (HV-RN [2]). • A model that can neither segment nor fuse buses is equal in power to the PRAM. The reconfigurable multiple bus machine (RMBM) [25– 27] is a recent model that has some advantages over the RN and other graph-based reconfigurable models. The RMBM distinguishes between processors and switches, so it can be used to design algorithms that are more processorefficient than their RN counterparts that use many processors only as switches. For example, an RMBM can rank a list of n elements in constant time with n11« processors (for any constant « . 0) [25, 28], while the best R-Mesh algorithm for the same problem uses Q(n21«) processors [19]. Also, a (2n 2 2) processor RMBM can compute an Euler tour of an n vertex tree in constant time [27, 28], while the R-Mesh uses n2 processors for the same problem [13]. As a third example, an RMBM with Q(n21«) processors (for any constant « . 0) can find the connected components of an n vertex graph in constant time [27]; the best R-Mesh solution for this problem uses Q(n4) processors [34]. In each of these cases, the RMBM uses switches to manipulate buses where the R-Mesh would have to use processors for the same purpose. This more efficient use of processors may also have practical significance, as a bbit switch usually costs less than a b-bit processor. Like the RMBM, the DMBC [21, 22] also distinguishes between processors and switches, so it too has a processor utilization advantage over graph-based reconfigurable models. In this paper, we use the RMBM (that admits a natural separation of the abilities to segment and/or fuse buses) to establish the hierarchy of relative powers of the PRAM and reconfigurable bus-based models. We obtain separations between models that can neither segment nor fuse buses, models with only the ability to segment buses, and those with the ability to fuse buses. In models such as the
83
RN, it is difficult to classify (in a natural way) a local action of a processor as segmenting and/or fusing buses. Therefore, unlike the RMBM, these models do not readily permit the study of the contribution of these abilities to the power of a reconfigurable bus-based model. The relationships between the PRAM, RMBM, and RN translate to other graph-based reconfigurable models such as the PARBS, PPA, and R-Mesh. Due to similarities between the RMBM and other ‘‘non-graph-based’’ models (such as the DMBC, RAP, and REBSIS), the results presented in this paper can easily be extended to include these non-graph-based models. In Section 2, we describe the RMBM and RN; for completeness we also include a brief description of the PRAM. In Section 3, we formalize the notion of ‘‘relative power’’ of two models of computation and, in Section 4, we relate the powers of the RMBM and RN. Section 5 is devoted to establishing a hierarchy of powers of the PRAM and reconfigurable bus-based models. Finally, in Section 6, we summarize our results and make some concluding remarks. 2. MODELS USED IN THIS PAPER
In this section, we describe three models of parallel computation (the PRAM, RMBM, and RN), whose relative powers are the focus of this paper. We assume that the processors of these models have the same instruction set, which includes operations normally deemed executable by a PRAM processor in constant time. All of the above models are synchronous.
2.1. The Parallel Random Access Machine The Parallel Random Access Machine (PRAM) consists of a set of processors and a shared memory. Each processor is permitted to access any shared memory location in constant time. In addition to the shared memory, each processor has a local memory. Based on restrictions placed on shared memory accesses, a PRAM can be (i) Exclusive Read, Exclusive Write (EREW); (ii) Concurrent Read, Exclusive Write (CREW); or (iii) Concurrent Read, Concurrent Write (CRCW). For the CRCW PRAM, concurrent writes are usually resolved using the Priority, Arbitrary, Common, or Collision rules. A detailed discussion of the PRAM and PRAM algorithms appears in [7, 9]. For any X, Y [ hC, Ej (for Concurrent and Exclusive)4, an XRYW PRAM with P processors and S locations of shared memory space is denoted by XRYW PRAM[P, S]. The space is the largest shared memory address used by the PRAM, rather than the number of shared memory locations accessed. The size of an XRYW PRAM[P, S] is defined to be Q(P 1 S); the size of a model is indicative of the number of hardware components in it. 4
Throughout this paper, we assume that a model that is permitted to write concurrently also has the concurrent read capability; that is, ERCW models are not considered.
84
TRAHAN, VAIDYANATHAN, AND THIRUCHELVAN
2.2. The Reconfigurable Multiple Bus Machine The RMBM [25–27] consists of a set of processors that communicate using multiple buses. The processors can reconfigure their communication pattern by moving their read and write connections from one bus to another, breaking buses into separate bus segments, and connecting (not necessarily adjacent) buses by ‘‘fuse lines.’’ 2.2.1. Model Definition The RMBM comprises P processors (indexed 0, 1, ..., P 2 1), B buses (indexed 0, 1, ..., B 2 1), P fuse lines and PB sets of switches, Qi,j 5 hci, j,0, ci, j,1 , si, j,0 , si, j,1 , fi, jj where 0 # i , P and 0 # j , B. Each processor has its own local memory; there is no shared memory. Each processor has a fuse line associated with it that is placed perpendicular to the buses. The buses provide the communication paths between the processors, and the switches and fuse lines configure these communication paths. Each processor has a write port (port 0) and a read port (port 1). The switches in the set Qi, j (for all 0 # j , B) are controlled only by processor i. For all 0 # i , P, 0 # j , B, and 0 # k , 2, switches in Qi, j operate as follows.
• Setting switch ci, j,k (called a connect switch) connects port k of processor i to bus j. • Switches si, j,0 and si, j,1 (called segment switches) are located on bus j to the left of ci, j,0, and between ci, j,0 and ci, j,1, respectively (see Fig. 1(b)). When set, si, j,k segments bus j at the point where it is located. • Fuse switch fi, j is located at the intersection of the fuse line of processor i and bus j. When set, fi, j connects the fuse line of processor i to bus j. The effect achieved by a processor i that has set fuse switches fi, j 1 , fi, j 2 , ..., fi, jm is to fuse buses j1 , j2 , ..., jm .
Figure 1a illustrates the structure of the RMBM and Fig. 1b details the switches associated with processor i. Thus, by appropriately setting its segment and fuse switches, the RMBM can configure the bus structure into several ‘‘fused bus segments.’’ Initially, all switches in the RMBM are assumed to be reset. The read port of a processor has a buffer that holds the value read from the bus to which it is connected. Processors can perform one of the following operations in constant time; set or reset a switch, access a local memory word or a read port buffer, write on a bus, or execute an instruction from its instruction set. If the read port of a processor is connected to a bus, then at each step during which a value is written on the bus, this value is copied in to the read port buffer. When no value is written on the bus to which a read port is connected, the contents of the buffer of that read port remain unchanged. A processor may connect its read and write ports to different buses; however, each port can be connected to at most one bus at any point in time. Like a PRAM, an RMBM can be EREW, CREW, or CRCW, depending on whether simultaneous reads from and writes on fused bus segments are allowed. As in a PRAM, concurrent writes on a fused bus segment are resolved using the Priority, Arbitrary, Common, or Collision rules. The use of the Priority and Arbitrary rules for a spatially distributed resource such as a bus is perhaps questionable; we permit these rules for reconfigurable busbased models only for making the results in this paper general. Lemma 1 (see Section 4) relates the Priority and Arbitrary rules to the Common and Collision rules for reconfigurable bus-based models. An assumption central to all reconfigurable bus-based models (including the RMBM and RN) is that there is negligible delay or attenuation of a signal transmitted on a fused bus segment. Several arguments in support of this
FIG. 1. The structure of the RMBM.
85
SEGMENTING AND FUSING BUSES
assumption have been put forth (for instance, see [11, 15, 16, 21, 24]). 2.2.2. Versions of the RMBM As mentioned earlier, we consider two ways in which a reconfigurable bus-based model can manipulate buses, namely, segmenting and fusing. Based on the ability to segment and/or fuse buses, we have four versions of the RMBM, which we describe below. The Extended RMBM (E-RMBM) is the version described so far in this section and is the most ‘‘powerful’’ of all versions of the RMBM. The Fusing RMBM (F-RMBM) has only connect and fuse switches; that is, it is permitted only to fuse buses but not segment them. We show in Section 4.1 that an FRMBM (with polynomially bounded resources) can simulate each step of an RN in constant time, and in Section 5.3 that the F-RMBM (again with polynomially bounded resources) can simulate each step of the E-RMBM in constant time. In other words, the ability to fuse buses is all that is required to simulate any graph-based reconfigurable model in constant time. The Segmenting RMBM (S-RMBM) has only connect and segment switches; that is, it is permitted only to segment buses but not fuse them. The ability to segment buses adds only a limited amount of power to the model, as shown in Section 5.2. In other words, the ability to fuse buses is more crucial to a reconfigurable bus-based model (compared to the ability to segment buses). Although the S-RMBM is not as powerful as the F-RMBM or E-RMBM, its ability to segment buses permits it to perform operations such as finding the OR of n bits in constant time; it is well known that this cannot be done in constant time on a CREW PRAM [6]. In fact, we show that the concurrent write capability is both necessary and sufficient for a PRAM to simulate (in constant time) the ability of an RMBM to segment buses. In Section 4.2, we also relate the powers of the S-RMBM and the ‘‘HV-RN’’ [2], a special case of the RN. The Basic RMBM (B-RMBM ) has only connect switches; that is, it is not permitted to segment or fuse buses. This version is not a ‘‘truly reconfigurable’’ model as it cannot manipulate the communication medium. In fact, the B-RMBM just permits processors to communicate with each other in constant time, if they can identify a common bus to be used for the communication. This is analogous to a PRAM in which processors must identify a common memory location to be used for communication. Following this observation, we show in Section 5.1 that the B-RMBM and the PRAM are equally powerful. The above versions of the RMBM can also be viewed as operation modes of the RMBM. For example, an F-RMBM can be viewed as an RMBM that operates without setting any segment switch. The term RMBM is used to collectively denote all versions. For any X, Y [ hC, Ej (for Concurrent and Exclusive)
and for any Z [ hB, S. F. Ej (for Basic, Segmenting, Fusing, and Extended), an XRYW Z-RMBM with P processors and B buses is denoted by XRYW Z-RMBM[P, B]. The size of an XRYW Z-RMBM[P, B] is defined to be Q(PB); the size of an RMBM is indicative of the number of switches in it.
2.3. The Reconfigurable Network 2.3.1. Model Definition A Reconfigurable Network (RN) [4, 24] is a network of processors that can be represented as a connected graph whose vertices are processors and whose edges are fixed connections (called external connections) between processors. Each edge incident on a processor corresponds to a (bidirectional) port of that processor. A processor can internally partition its ports so that all ports in the same block of the partition are fused together (short-circuited) to form a bus. The connections established between ports of a processor are called its internal connections. By selecting different internal connections, the processors can create several buses that connect the ports of various processors. The representation of an RN must include (in addition to the specification of its underlying graph) the correspondence between each port of each processor and the external edge incident on it. The index x of a port of a processor is a number from h0, 1, ..., P 2 2j. We will represent an RN with P processors by a P 3 P matrix, A, defined below. For any 0 # i, j , P,
A(i, j) 5
5
x,
if there is an external edge between processors i and j that is incident on port x of processor i
y, if there is no external edge between processors i and j
Figure 2 shows a four processor RN and the matrix corresponding to it. In this figure, processor indices are italicized and enclosed in boxes, whereas port indices are shown in a smaller font. For processor 2, the partition h0, 1; 2j would fuse ports 0 and 1; this allows information arriving at port 0 to flow out of port 1 and vice versa. Each processor may partition its ports differently, and for any given processor, its partition may change at each step. In a step the partition of the ports of a processor can only change incrementally in one of the following ways: (i) moving a port from one block to another existing block; (ii) moving a port from a block to a new single element block. This assumption is necessary as the degree (and hence, the number of ports) of a given vertex of an RN need not be a constant. Initially, all ports are in separate single element blocks. As in the RMBM, the RN assumes negligible delay and attenuation due to switches placed on buses. For any X, Y [ hC, Ej (for Concurrent and Exclusive),
86
TRAHAN, VAIDYANATHAN, AND THIRUCHELVAN
HV-RN can also be extended to one in which the underlying graph is a torus (mesh with wraparound connections in rows and columns). In fact, the PPA [15] is this extension of the HV-RN to a torus. 3. POLYNOMIALLY BOUNDED MODELS AND RELATIVE POWER FIG. 2. Representation of an RN.
an RN whose underlying graph has P vertices (processors) and E edges is denoted by XRYW RN[P, E]. Like an RMBM, the size of an RN is the number of switches in it. Consider an RN[P, E ]; clearly, its size is V(E). Let pi (for 0 # i , P) be a processor of the RN and let di be its degree. To permit all possible partitions of the ports, processor pi need only use O(d 2i ) internal switches (that connect all possible port pairs). Therefore, the size of an RN[P, E] is P21 O(oi50 d 2i ) 5 O(E 2). Since we will only consider ‘‘polynomially bounded’’ models (defined in Section 3), these bounds on the size of an RN[P, E] suffice for our discussion. In general, the RN represents a family of networks, each of which is based on an underlying connected graph. Therefore, in simulating a step of an RN on a model M, we will place no restriction on the topology of the underlying graph of the RN. On the other hand, in simulating a step of a model M by an RN, the existence of an RN (with some underlying graph) that can accomplish the simulation is sufficient. 2.3.2. Versions of the RN The RN is a very general reconfigurable bus-based model; several other models such as the R-Mesh, PARBS and PPA can be viewed as special cases of the RN. BenAsher et al. [2, 4] (see also [24]) proposed several versions of the RN that differ in terms of the allowable port partitions. In this paper, we use the term ‘‘RN’’ to denote the most general form in which no restriction is placed on the underlying connected graph and the manner in which a processor can partition its ports. The R-Mesh [16] is a special case of the RN in which the underlying graph is a mesh. In an R 3 C R-Mesh, the underlying mesh has R rows and C columns. Each processor has four ports, denoted by N, S, E, and W (for North, South, East, and West) in the obvious manner. Like the RN, each processor of an R-Mesh is permitted to partition its four ports in any manner. The Horizontal–Vertical RN (HV-RN) [2] is a special case of the R-Mesh in which each processor is permitted to use only the following port partitions: hN; S; E; Wj, hN, S; E; Wj, hN; S; E,Wj, and hN, S; E, Wj. Therefore, each bus of the HV-RN can be represented as a horizontal or vertical line. In contrast, a processor in the (unrestricted) R-Mesh can partition its ports in many other ways such as hN, S, E, Wj or hN, W; S, Ej. We remark here that the definition of the
Let M be a model of computation of size S that solves a problem of size n. The model M is said to be polynomially bounded iff S 5 O(nc), for some constant c. For a polynomially bounded model, the size of the local memory in each processor is also polynomially bounded in the problem size. In subsequent discussion, the term ‘‘model’’ refers to a PRAM, RMBM, or RN. This paper assumes all models to be polynomially bounded (in the problem size n). The processor word-size and bus-width of all models is V(log n) bits. We note here that all simulations in this paper only need word-sizes of Q(log n) bits. A larger word-size is used only if the simulated model uses a larger word-size in its computation. Let T( P, M ) denote the time needed to solve a computational problem P on a model M. DEFINITIONS [30]. Let M1 and M 2 be two (polynomially bounded) models. (i) M1 , M 2 iff for all problems P, T(P, M 2) 5 2 O(T( P, M1)). (ii) M1 , ? M 2 iff there is at least one problem P such that T(P, M 2) 5 o(T(P, M1)). (iii) M1 , M 2 iff M1 , M 2 and M1 , ? M2 . 2 (iv) M1 5 M 2 iff M1 , M 2 and M 2 , M1 . 2 2 Remarks. Definition (i) can be interpreted as M1 is at most as powerful as M 2 . That is, each step of M1 can be simulated on a (polynomially bounded) M 2 in constant time. Definition (ii) indicates that there is at least one problem for which M 2 is faster than M1 . This does not preclude the existence of another problem for which M1 is faster than M 2 . Definition (ii) guarantees, however, that M1 is not as fast as or faster than M 2 for all problems. In this sense, one may say that M1 is not more powerful than M 2 (or M 2 is not less powerful than M1). Definition (iii) can be interpreted as M1 is strictly less powerful than M 2 . In Definition (iv), M1 and M 2 have the same power. LEMMA 1. For any concurrent write resolution rules R1 , R 2 [ hPriority, Arbitrary, Common, Collisionj, the following assertions hold: (i) R1 CRCW PRAM 5 R 2 CRCW PRAM. (ii) R1 CRCW RMBM 5 R 2 CRCW RMBM. (iii) R1 CRCW RN 5 R 2 CRCW RN. Proof. Part (i) is a well-known fact (for instance, see [7, 9, 10]). Part (ii) follows from a result in [27, 28] that any step of a Priority CRCW RMBM[P, B] can be simulated in constant time on a Common or Collision CRCW RMBM[P 11«, maxhP, BP «j]; 0 , « # 1 is any constant. Part
SEGMENTING AND FUSING BUSES
(iii) follows from part (ii) and Lemmas 2 (see Section 4.1) and 17 (see Section 5.3). j Remark. By Lemma 1, there is no loss of generality in referring to a CRCW model without specifying the rule used to resolve concurrent writes. 4. RELATIVE POWERS OF THE RMBM AND RN
In this section, we show that an F-RMBM and an RN are equally powerful. Subsequently in Section 5.3, we will show that the F-RMBM and E-RMBM are equal to power. This implies that the relationships established between PRAMs and RMBMs translate to results on all other graph-based reconfigurable models as well. We also relate the powers of the S-RMBM and the HV-RN.
4.1. Simulations between an F-RMBM and an RN We now establish that polynomially bounded F-RMBMs and RNs have the same power, by constructing an FRMBM algorithm to simulate each step of any RN in constant time and by constructing an RN that can simulate each step of an F-RMBM in constant time. LEMMA 2. For any X, Y [ hC, Ej, each step of an XRYW RN[P, E] can be simulated on an XRYW FRMBM[P 2, (3P(P 2 1)/2) 1 P 2] in O(1) time. Proof. Let the processors of the RN be denoted by pi (where 0 # i , P) and let its ports have indices from h0, 1, ..., P 2 2j. The RN is represented as a P 3 P matrix A (see Section 2.3). If A(i, j) 5 / y, then let A(i, j) 5 xi, j ; recall that xi, j is the index of the port of processor pi on which the external edge between pi and pj is incident. Divide the processors of the RMBM into P teams, T(i), each with P processors fi, j (for 0 # i, j , P). The processors of T(i) collectively simulate RN processor pi . RMBM processor fi, j holds A(i, j), and fi, j simulates port xi, j of pi . For any port l of an RN processor pi , let fi(l ) denote the index within team T(i) of the RMBM processor that simulates port l. Denote the first P(P 2 1)/2 buses of the RMBM by ai, j , where 0 # i , j , P. Let the next P(P 2 1) buses of the RMBM be bi, j , where 0 # i , P and 0 # j , P 2 1. Denote the last P 2 buses of the RMBM by ci, j , where 0 # i, j , P. The buses ai, j (resp., bi, j and ci, j) are used for the external connections of the RN (resp., for the internal connections of processor pi , and for communication between processors of team T(i)). To establish the external connections of the RN, each processor fi, j with A(i, j) 5 / y performs all reads and writes (corresponding to port xi, j of the RN processor pi) on bus amin(i, j),max(i, j) . We refer to the bus amin(i, j),max(i, j) as the external bus of fi, j (or fj,i). Each processor pi of the RN partitions its ports and assigns a block number #(i, l ) to each port l ; note that 0 # #(i, l ) , P 2 1. To establish the internal connections of pi , processor fi, j of the RMBM fuses its external bus to bus bi,#(i,xi, j). Clearly, two ports l 1 and l 2 of pi are in the
87
same block iff the external buses of processors fi, fi (l1) and fi, fi (l 2) are fused via bus bi,#(i,l 1) 5 bi,#(i,l 2). In each team T(i), processor fi,0 holds all information (regarding changes in the port partition, reads and/or writes and internal computation to be performed) about simulated RN processor pi . If pi moves a port l from some block k1 to an existing block k2 , then fi,0 informs fi, fi (l ) (via bus ci,l) to fuse its external bus to bus bi,k2 , instead of bi,k1 . If pi moves a port l from some block k1 to a new singleton block, then fi,0 first selects a new block number k2 and then informs fi, fi (l ) to fuse its external bus to bus bi,k2 . The book-keeping necessary for selecting a new block number (from a pool of available block numbers) is easy to implement with a stack. Note that up to this point all reads and writes are exclusive. Let pi read from some port l 1 and write on some port l 2 . The RMBM processor fi,0 informs processors fi, fi (l1) and fi, fi (l 2) (via buses ci, l 1 and ci, l 2) to read from and write on their external buses. Concurrent writes (if any) are resolved by the RMBM in the same manner as the simulated RN. Processor fi, fi (l 1) conveys the value read to fi,0 (via bus ci,0). Processor fi,0 now has all the necessary information to perform any internal computation of pi in the simulated step. j Remarks. The above simulation of an RN by an RMBM can be made more efficient for regular graphs. For instance, a ÏP 3 ÏP R-Mesh can be simulated on an F-RMBM[O(P), O(P)]. If an E-RMBM is used, then the ÏP 3 ÏP R-Mesh can be simulated on an E-RMBM[O(P), O(ÏP )]. We now show that there is an RN that can simulate each step of an F-RMBM in constant time. LEMMA 3. For any X, Y [ hC, E j, each step of an XRYW F-RMBM [P, B ] can be simulated on an XRYW RN [O(PB), O(PB)] in O(1) time. Proof. Let the processors and buses of the RMBM be pi and bj , respectively (0 # i , P and 0 # j , B). We first consider the case where the F-RMBM (and hence the RN) can perform concurrent reads. Here the simulating RN is a (B 1 1) 3 (3P) R-Mesh. Denote the processors of the R-Mesh by fk,l , where 0 # k , B 1 1 and 0 # l , 3P. Rows 1, 2, ..., B of the R-Mesh simulate the B buses of the RMBM. For any 0 # i , P, columns 3i, 3i 1 1 and 3i 1 2 simulate the write port, read port and fuse line of processor pi of the RMBM. Let processor pi of the F-RMBM write on bus bj in the simulated step. Configure the R-Mesh so that there is a vertical bus connecting all processors in each column. Processor f0,3i writes j 1 1 on its vertical bus and all processors fl,3i (for 1 # l , B 1 1) read from this bus. Processor fj11,3i establishes the required path for f0,3i to perform the write (see Fig. 3). Columns 3i 1 1 and 3i 1 2 handle reading and fusing similarly. The details of the simulation are
88
TRAHAN, VAIDYANATHAN, AND THIRUCHELVAN
FIG. 3. An illustration of the simulation of an F-RMBM on an RN.
straightforward as illustrated in Fig. 3; the dashed lines in this figure are explained next. If the F-RMBM does not permit concurrent reads, then the simulating RN is an R-Mesh augmented with dedicated links (shown as dashed lines in Fig. 3) from processor f0,k to each processor fl,k (for all 1 , l , B 1 1); this allows the RN to set RMBM switches in constant time. j An immediate consequence of Lemmas 2 and 3 is the following theorem. THEOREM 4. For any X, Y [ hC, E j, XRYW RN 5 XRYW F-RMBM. j Remark. The above theorem establishes that the F-RMBM has the same power as the RN. In Section 5.3, we show that an F-RMBM and E-RMBM are equally powerful. 4.2. Simulations between an S-RMBM and an HV-RN We now show that a CRYW S-RMBM and a CRYW HV-RN are equally powerful. LEMMA 5. For any X, Y [ hC, E j, each step of an R 3 C XRYW HV-RN can be simulated on an XRYW S-RMBM [4RC, R 1 C 1 4] in O(1) time. Proof. A team of four adjacent RMBM processors simulates a processor of the HV-RN (that has four ports). Arranging these teams in correspondence with a row-major enumeration of HV-RN processors preserves the relative ordering of ports in any row or column of the HV-RN. Each bus (in any bus configuration) of the HV-RN must be a segment of one of the R horizontal buses or C vertical buses that span entire rows and columns, respectively. Therefore, the buses that can be generated by the HVRN are segments of R 1 C buses of the S-RMBM. The remaining four buses of the RMBM are used for communication between processors in a team. j LEMMA 6. For any X, Y [ hC, E j, each step of an XRYW S-RMBM [P, B ] can be simulated on a (B 1 1) 3 (4P) CRYW HV-RN in O(1) time. Proof. The simulation is similar to that in the proof of Lemma 3. To avoid the port partition hN, E, W; S j for a
read port connect switch, the corresponding HV-RN processor performs the read (rather than a processor in the topmost row) and then sends the value read to the topmost row. Writes are performed similarly. j An immediate consequence of Lemmas 5 and 6 is the following theorem. THEOREM 7. (i) CRYW HV-RN 5 CRYW S-RMBM, for any Y [ hC, E j. (ii) EREW HV-RN , EREW S-RMBM. j 2 Remark. The PPA [15] is the same as the HV-RN with the underlying mesh extended to a torus. It is easy to show that the results in Theorem 7 can also be extended to the PPA. That is, the PPA is only as powerful as an S-RMBM. 5. RELATIVE POWERS OF THE RMBM AND PRAM
In this section, we establish a hierarchy of relationships between the powers of various PRAMs (EREW, CREW, and CRCW) and the four versions of the RMBM. In particular, we show that the ability of a model to fuse buses is more powerful than its ability to segment buses. We also establish that for truly reconfigurable models (that can segment and/or fuse buses), the concurrent write capability does not add any more power than the concurrent read capability. That is, the CREW version is equal in power to the CRCW version. LEMMA 8. For any X, Y [ hC, E j and any Z [ hB, S, F, E j, the following assertions hold. (i) EREW PRAM , CREW PRAM , CRCW PRAM. (ii) EREW Z-RMBM , CREW Z-RMBM , CRCW 2 2 Z-RMBM. (iii) XRYW B-RMBM , XRYW S-RMBM , XRYW 2 2 E-RMBM. (iv) XRYW B-RMBM , XRYW F-RMBM , XRYW 2 2 E-RMBM. Proof. Part (i) is a well known fact (for instance, see [7, 9]). Part (ii) follows from the fact that a CRCW ZRMBM is a generalization of a CREW Z-RMBM, which in turn is a generalization of an EREW Z-RMBM. Parts (iii) and (iv), similarly, follow from the fact that the
SEGMENTING AND FUSING BUSES
E-RMBM is a generalization of the S-RMBM and F-RMBM, which in turn are generalizations of the B-RMBM. j For clarity, we divide the remainder of this section into four subsections that discuss the relative powers of the four versions of the RMBM and the PRAM. 5.1. Relative Powers of the B-RMBM THEOREM 9. For any X, Y [ hC, E j, XRYW PRAM 5 XRYW B-RMBM. Proof. The key point here is that each bus bi of the B-RMBM can be simulated by a shared memory location m(i) of the PRAM and vice versa. The only details that require elaboration are due to the fact that a memory location holds a value until it is overwritten, whereas a bus holds a value only during a read/write cycle. In simulating a memory location m(i) of a PRAM, the B-RMBM uses a bus bi and a dedicated processor pi , which reads from bi during the write cycle of the PRAM, and writes its contents on bi during the read cycle of the PRAM. Therefore, a step of an XRYW PRAM[P, S ] can be simulated in constant time on an XRYW B-RMBM[P 1 S, S ]. In simulating a bus bi of a B-RMBM, the PRAM uses a shared memory location m(i) and a dedicated processor pi . Before a read/write cycle of the B-RMBM, pi initializes m(i) to an illegal value; this serves to indicate whether m(i) (or the simulated bus bi) has been written on in the current cycle. Therefore, each step of an XRYW BRMBM[P, B] can be simulated in constant time on an XRYW PRAM[P 1 B, B ]. j An immediate consequence of Theorem 9 and Lemma 8(i) and (ii) is the following corollary. COROLLARY 10. EREW B-RMBM B-RMBM , CRCW B-RMBM. j
,
CREW
5.2. Relative Powers of the S-RMBM In this section, we show that concurrent writing capability is both necessary and sufficient for a PRAM to simulate (in constant time) the ability of an RMBM to segment buses. Miller et al. [16] showed that an R-Mesh can find the OR of n bits in constant time. The algorithm only uses horizontal and vertical buses and exclusive reads and writes. Therefore, we have the following lemma. LEMMA 11 [16]. An EREW Ïn 3 Ïn HV-RN can find the logical OR of n bits in O(1) time. Remark. The above result has also been proved in the context of several other reconfigurable bus-based models (for instance, see [15, 26, 33]). A CREW PRAM (with unbounded number of processors) requires V(log n) time to find the OR of n bits [6].
89
This coupled with Theorem 7 and Lemma 11 gives the following result. ? EREW HV-RN. THEOREM 12. (i) CREW PRAM , (ii) CREW PRAM , ? EREW S-RMBM. (iii) CREW PRAM , CREW S-RMBM. (iv) EREW PRAM , EREW S-RMBM. Remarks. Even a one-dimensional R-Mesh (or HVRN) can find the OR of n bits in constant time and is, therefore, not less powerful than the CREW PRAM. On the other hand, due to its small bisection width, the onedimensional R-Mesh requires V(n) time to sort (for instance) n bits; this problem can be solved on an EREW PRAM in O(log n) time. We now describe a technique called ‘‘neighbor localization’’ (see also [28, 29, 31]) that finds use in subsequent discussion. Consider a set of P processors. For some integer m $ 1, let each processor pi hold an integer value vi [ h0, 1, ..., m 2 1j. The right (resp., left ) neighbor of processor pi is the least (resp., largest) indexed processor pj to the right (resp., left) of pi that holds the same value as that held by pi ; that is, vi 5 vj . Neighbor localization involves obtaining pointers to the left and right neighbors of each processor pi . If a left or right neighbor does not exist, then the corresponding pointer is set to NIL. LEMMA 13 [28]. Left and right neighbor localization with parameters P and m can be performed on an EREW S-RMBM [P, m] in O(1) time. Proof Outline. We first explain the algorithm for right neighbor localization. Each processor pi connects its read and write ports to bus bvi , segments bvi between its read and write ports (by setting segment switch si,vi ,1), and initializes its read port buffer to NIL. Each pi now writes its index i on bus bvi ; note that all reads (writes) are exclusive as bus bvi has been segmented to the left (right) of each point on which a read (write) is performed. It is easy to see that processor pi reads the index j (written by pj ) iff pj is the right neighbor of pi . The read port of a processor that does not receive a value (that is, the processor has no right neighbor) has a NIL. Left neighbor localization can be performed in the same manner by requiring processor pi11 to perform the write on behalf of processor pi . This effectively places the write port of a processor to the right of the read port. The data movement required for this is quite straightforward; details appear in [28]. j LEMMA 14. Each step of a Priority CRCW PRAM [P, S ] can be simulated in O(1) time on a CREW S-RMBM [P 1 S, S ]. Proof. Each step of the CRCW PRAM, except resolution of concurrent writes, can be simulated in O(1) time as in the proof of Theorem 9. As in Theorem 9, each shared memory location m(i) of the PRAM is simulated by a bus bi in the RMBM. By left neighbor localization, the RMBM
90
TRAHAN, VAIDYANATHAN, AND THIRUCHELVAN
forms (for each bus bi) a list Li of processors wishing to write on bi ; these RMBM processors correspond to PRAM processors attempting to write on m(i). List Li is in decreasing order of processor indices. Therefore, the last processor in Li has the lowest among those attempting to write on bi and writes (exclusively) on bi . Remark. A two-dimensional PARBS can also simulate a Priority CRCW PRAM [4, 35]. These simulations use a two-dimensional Collision CRCW PARBS with O(PS ) processors to simulate a Priority CRCW PRAM[P, S ]. Our simulation is achieved on an exclusive write model. Also, our simulation requires fewer processors; more switches are used, however, if P 5 o(S ). Recall that for any bus bi of an S-RMBM[P, B ], there are 2P segment switches on bi (two per processor). For ease of explanation, denote these segment switches on bi by si, j (where 0 # j , 2P). For any 0 # j , 2P, define ai, j , the jth atomic segment of bi , to be the portion of bi between si, j and si, j11 . Here si,2P denotes the right end of bi ; in fact, we will view si,2P as a dummy segment switch that is always set. For any 0 # j , j9 , 2P, two atomic segments ai, j and ai, j9 are connected iff none of the segment switches si, j11 , si, j12 , ..., si, j9 are set (that is, bus bi is not segmented at any point between ai, j and ai, j9). The representative of an atomic segment ai, j is ai, j9 iff j # j9 and j9 is the largest index such that ai, j and ai, j9 are connected. LEMMA 15. Each step of a CRCW S-RMBM [P, B ] can be simulated in O(1) time on a Priority CRCW PRAM [O(P 2B ), O(PB)]. Proof. Consider any bus bi of the RMBM. For each atomic segment ai, j of bi on which there is a read (or write), if the PRAM can determine its representative ri, j , then it can perform the read (or write) on a memory location m(ri, j ) associated with ri, j . We now explain how the PRAM can determine ri, j for each atomic segment ai, j . Let si, j 9 (for j9 . j) be the first segment switch to the right of ai, j that is set. By definition, ri, j is the atomic segment ai, j 921 . That is, if minhl : j , l # 2P and si, l is setj 5 j9, then ri, j 5 ai, j 921 . For any given atomic segment ai, j , we will use (2P 2 j) PRAM processors p l (where j , l # 2P). For each ai, j , the Priority CRCW PRAM finds minhl : j , l # 2P and si, l is setj, and hence ri, j , in constant time simply by having processor p l write l on a shared memory location M(i, j), if si, l is set. It is straightforward to verify that O(P 2B) processors and O(PB) shared memory space suffices. j The following theorem is a direct consequence of Theorem 9 and Lemmas 14 and 15. THEOREM 16. CRCW PRAM 5 CRCW S-RMBM 5 CREW S-RMBM. j
5.3. Relative Powers of the F-RMBM In this section we show that an E-RMBM is only as powerful as the F-RMBM (and hence the RN). The power
of the E-RMBM (and hence the F-RMBM) relative to other models is established in Section 5.4. LEMMA 17. For any X, Y [ hC, Ej, each step of an XRYW E-RMBM[P, B] can be simulated in O(1) time on an XRYW F-RMBM[P 1 2PB, 4PB]. Proof. Since the F-RMBM has sufficient processors and buses, it can simulate all actions of the E-RMBM except segmenting buses. We show below that the FRMBM can simulate bus segmenting without using concurrent reads or writes. Let the processors of the E-RMBM be denoted by pi (where 0 # i , P). For ease of explanation, we first assume that B 5 1. Here the simulating F-RMBM has 3P processors and 4P buses. The first P (resp., last P) processors of the F-RMBM are called left (resp., right) segment processors and are denoted by s Li (resp., s Ri), where 0 # i , P. The middle P processors of the F-RMBM are called the primary processors and are denoted by fi (for 0 # i , P), Each primary processor fi simulates all actions of processor pi of the E-RMBM, except setting segment switches. The segment processors s Li and s Ri simulate the two segment switches of pi on the only bus of the E-RMBM. The last 2P buses of the F-RMBM are called the secondary buses and are denoted by bi, j , where 0 # i , P and j [ hL, Rj. The secondary bus bi, j , is used by primary processor fi to communicate with segment processor s ij . Since a segment processor s ij will need to communicate only with fi , there are sufficient secondary buses to ensure that all reads and writes are exclusive. The first 2P buses are called the primary buses; they represent the atomic segments of the only bus of the ERMBM. These atomic segments can be fused (if the segment switches between them are not set) by the segment processors as shown in Fig. 4. If all secondary buses are fused, then the bus of the E-RMBM is represented as a fused bus that snakes across the primary processors. Clearly, each primary processor can access each atomic segment of the bus. For B . 1, we use B sets of primary buses (one per bus of the E-RMBM), each set consisting of 2P primary buses (one per atomic segment of the simulated bus). That is, 2PB primary buses are used. The number of segment processors is also increased to 2PB so that they can fuse the appropriate primary buses (if necessary) in constant time. To allow the primary processors to communicate with the segment processors without concurrent reads and writes, the number of secondar buses is also increased to 2PB. In summary, the simulation proceeds as follows. Each primary processor sets its connect and fuse switches on appropriate atomic segments represented by primary buses. If a segment switch needs to be reset or set, then the primary processor instructs the appropriate segment processor (over the dedicated secondary bus) to fuse or ‘‘unfuse’’ the appropriate pair of primary buses. Once this has been done, the primary processors operate exactly as the processors of the E-RMBM. j
SEGMENTING AND FUSING BUSES
91
FIG. 4. Simulation of the atomic segments of an E-RMBM by an F-RMBM.
From Theorem 4 and Lemma 17, we have the following result. THEOREM 18. For any X, Y [ hC, Ej, XRYW RN 5 XRYW F-RMBM 5 XRYW E-RMBM.
5.4. Relative Powers of the E-RMBM We first establish that an E-RMBM (and hence an FRMBM) is more powerful than a CRCW PRAM (and hence an S-RMBM). This result shows that the ability of the model to fuse buses is more powerful than the ability to segment buses. Next, as in the case of the S-RMBM, we show that for the E-RMBM (and hence the F-RMBM), concurrent writing capability is not necessary, if concurrent reads are permitted. An EREW RN can find the exclusive OR of n bits in O(1) time [5, 16, 20, 36]. This, coupled with the fact that a polynomially bounded CRCW PRAM requires V(log n/ log log n) time to find the exclusive OR of n bits [1], leads to the following theorem. ? EREW RN 5 THEOREM 19. (i) CRCW PRAM , EREW F-RMBM 5 EREW E-RMBM. (ii) CRCW PRAM , CREW RN 5 CREW F-RMBM 5 CREW E-RMBM. (iii) For any X, Y [ hC, Ej, XRYW S-RMBM , XRYW F-RMBM. j Remark. Part (iii) shows that the ability of a reconfigurable model to fuse buses is more powerful than its ability to segment buses. We now show that for an F-RMBM (and hence ERMBM), concurrent writing capability adds no power beyond that given by concurrent reading capability. By Theo-
rem 18, it is sufficient to show that a polynomially bounded CREW E-RMBM can simulate any step of a CRCW FRMBM in constant time. Consider any step of a CRCW F-RMBM[P, B]. Two buses bi and bj of this F-RMBM are said to be in the same component iff information can flow from bi to bj and vice versa; that is, bi and bj are in the same component iff they have been fused to each other directly or via other buses of the F-RMBM. Finding the components of the F-RMBM involves assigning to each bus bi a component number ci such that, for any two buses bi and bj , ci 5 cj iff bi and bj are in the same component. Also, the component number ci of bi must be polynomially bounded in the size of the F-RMBM. If the components of the F-RMBM can be determined, then it can be simulated on a polynomially bounded CRCW B-RMBM. This B-RMBM has as many buses as the range of values of the component numbers of the simulated F-RMBM. Since the component numbers are polynomially bounded, so is the B-RMBM. For each processor of the simulated F-RMBM that writes on (or reads from) a bus bi , a corresponding processor of the B-RMBM writes on (or reads from) bus ci . Clearly, this achieves the same effect as the step of the simulated F-RMBM. By Theorems 9 and 16, CRCW B-RMBM 5 CREW SRMBM. By Lemma 8, CREW S-RMBM , CREW E2 RMBM. Consequently, if the components of the buses of an F-RMBM are known, then a CREW E-RMBM can simulate a CRCW F-RMBM via a CRCW B-RMBM. It remains to show that a polynomially bounded CREW E-RMBM can find the components of the CRCW FRMBM in constant time. Converting the relationships between the buses of the simulated CRCW F-RMBM[P, B] into a B-vertex graph is straightforward. We now show
92
TRAHAN, VAIDYANATHAN, AND THIRUCHELVAN
that a polynomially bounded CREW E-RMBM can find the connected components of a graph in constant time. LEMMA 20. A CREW E-RMBM [n3, n2] can find the connected components of an n-vertex graph in O(1) time. Moreover, the component numbers are from the set h0, 1, ..., n 2 1j. Proof. Let the processors and buses of the RMBM be pi, j, k and bi, j , respectively, where 0 # i, j,k , n. The graph is assumed to be input as an adjacency matrix A. For any 0 # i, j , n, the entry A(i, j) in row i and column j of A is a 1 iff the graph has an edge between vertices i and j. Initially, processor pi, j, 0 holds A(i, j). The first step is to make n copies of A as follows. Processor pi, j,0 writes A(i, j) on bus bi, j and each processor pi, j, k (for all 0 # k , n) reads from bus bi, j . We will now view the RMBM as n copies (denoted by Ck for 0 # k , n) of a CREW E-RMBM [n2, n], each of which holds the adjacency matrix of the input graph. In fact, the processors and buses of copy Ck are pi, j, k (for all 0 # i, j , n) and bi, k (for all 0 # i , n). Each copy Ck determines the vertices reachable from vertex k as follows. Each processor pi, j, k with A(i, j) 5 1 fuses buses bi, k and bj, k . As a result, two vertices u and v of the graph are in the same component iff buses bu, k and bv, k are fused (directly or via other buses). Next p0, 0, k writes a signal on bk,k and each pi, 0, k reads from bus bi,k . Clearly, pi, 0, k receives the signal iff vertices i and k are in the same component. If ik is the smallest index such that pik, 0, k receives the signal, then the component number of vertex k is ik ; this can be easily determined by Ck in constant time
using left neighbor localization (see Lemma 13). Also note that each component number is from the set h0, 1, ..., n 2 1j and is, therefore, polynomially bounded in n. j Remark. The above algorithm will work even if the graph is input as an edge set since the edge set can easily be converted in constant time into an adjacency matrix. We now have the following theorem. THEOREM RMBM. j
21. CRCW
F-RMBM
5
CREW
F-
Figure 5 summarizes the relationships between the RMBM, the PRAM, and the RN. 6. CONCLUDING REMARKS
In this paper we have identified two basic abilities of reconfigurable bus-based models: (i) the ability to segment a bus into two or more bus segments and (ii) the ability to fuse two or more buses or bus segments. We have used the RMBM to study the contribution of these abilities along with the capability of the model to perform concurrent reads and writes on a bus and derived a hierarchy of powers of the RMBM, PRAM, and RN. The primary results obtained from this hierarchy are the following. • The ability of a model to fuse buses is more powerful than the ability to segment buses. • For all truly reconfigurable models (that can either segment and/or fuse buses), concurrent writing capability adds no more power than concurrent reading capability.
FIG. 5. The relative powers of the PRAM, RMBM, and RN.
93
SEGMENTING AND FUSING BUSES
• Concurrent writing capability is both necessary and sufficient for a PRAM to simulate (in constant time) the ability to segment buses. The PRAM cannot simulate (in constant time) the ability to fuse buses, however. • A model that can neither segment nor fuse buses is equal in power to the PRAM. Other reconfigurable models such as the distributed memory bus computer (DMBC) [21, 22], reconfigurable array processor (RAP) [8], and the reconfigurable buses with shift switching (REBSIS) [14] do not have an underlying graph like the RN. It can be shown by simulations like those in this paper that the DMBC, RAP and REBSIS are equal in power to the E-RMBM (or F-RMBM). Therefore, the results in this paper are of a general nature. From the discussion for Theorem 21, it is clear that if a model can find the connected components of a graph in O(T) time, then a model can simulate any step of the FRMBM in O(T) time. Since a PRAM can find the connected components of a graph in polylogarithmic time (for instance, see [23]), the set of all problems that can be solved in polylogarithmic time on a polynomially bounded RMBM (and hence a polynomially bounded RN) is NC, the set of problems that can be solved on a PRAM in polylogarithmic time. Therefore, problems that are inherently difficult to parallelize on the PRAM are difficult to parallelize on reconfigurable bus-based models as well. This result was also obtained by Ben-Asher et al. [3, 4] for the RN. Ben-Asher et al. [3] established that the class of languages accepted in constant time on a polynomially bounded Collision CRCW RN is equal to SLH, the symmetric logspace oracle hierarchy. Recently, Nisan and TaShma [18] showed that SLH collapses to SL, the class of languages accepted on Symmetric Turing Machines (TMs) in logarithmic space. The class SL is contained between L, the class of languages accepted on deterministic TMs in logarithmic space, and NL, the class of languages accepted on nondeterministic TMs in logarithmic space. Since CRCW RN 5 CRCW F-RMBM 5 CRCW ERMBM (see Theorem 18), the class of languages accepted in constant time on a polynomially bounded CRCW FRMBM or E-RMBM is also equal to SL. Also, following Ben-Asher et al., this is contained in the class of languages accepted on a deterministic TM in O(log1.5n) space and the class of languages accepted on a polynomially bounded EREW PRAM in O(log1.5n) time. Furthermore, Theorems 18 and 21 show that the above relationships hold for CREW RNs, F-RMBMs, and E-RMBMs as well. Since the RMBM, like the RN and R-Mesh, has proved to solve a number of fundamental problems in constant time [25, 27, 28], this result serves to set limits on the complexity of constant time RMBM computation. ACKNOWLEDGMENTS The authors thank the anonymous referees for their useful suggestions.
REFERENCES 1. Beame, P., and Hastad, J. Optimal bounds for decision problems on the CRCW PRAM. J. Assoc. Comput. Mach. 36 (1989), 643–670. 2. Ben-Asher, Y., Gordon, D., and Schuster, A. Efficient self simulation algorithms for reconfigurable arrays. J. Parallel Distrib. Comput. 30, (1995), 1–22. 3. Ben-Asher, Y., Lange, K.-J., Peleg, D., and Schuster, A. The complexity of reconfiguring network models. Inform. and Comput. 121, (1995), 41–58. 4. Ben-Asher, Y., Peleg, D., Ramaswami, R., and Schuster, A. The power of reconfiguration. J. Parallel Distrib. Comput. 13, (1991), 139–153. 5. Chen, G.-H., Wang, B.-F., and Li, H. Deriving algorithms on reconfigurable networks based on function decomposition. Theoret. Comput. Sci. 120, (1993), 215–227. 6. Cook, S., Dwork, C., and Reischuk, R. Upper and lower time bounds for parallel random access machines without simultaneous writes. SIAM J. Comput. 15, (1986), 87–97. 7. Ja´Ja´, J. An Introduction to Parallel Algorithms. Addison–Wesley, Reading, MA, 1992. 8. Kao, T. W., Horng, S. J., and Tsai, H. R. Computing connected components and some related applications on a RAP. Proc. International Conference on Parallel Processing, 1993, Vol. III, pp. 57–64. 9. Karp, R. M., and Ramachandran, V. Parallel algorithms for sharedmemory machines. In van Leeuwen, J. (Ed.). Handbook of Theoretical Computer Science, Vol. A: Algorithms and Complexity. Elsevier, 1990, pp. 869–941. 10. Kucera, L. Parallel computation and conflicts in memory access. Inform. Process. Lett. 14 (1982), 92–96. 11. Li, H., and Maresca, M. Polymorphic-torus architecture for computer vision. IEEE Trans. Pattern Anal. Mach. Intell. 11, (1989), 233–243. 12. Li, H., and Stout, Q. F. Reconfigurable SIMD massively parallel computers. IEEE Proc. 79 (1991), 429–443. 13. Lin, R. Fast algorithms for lowest common ancestors on a processor array with reconfigurable buses. Inform. Process. Lett. 40 (1991), 223–230. 14. Lin, R., and Olariu, S. Reconfigurable buses with shift switching— Architectures and applications. Proc. International Phoenix Conference on Computers and Communication, 1993, pp. 23–29. 15. Maresca, M. Polymorphic processor arrays. IEEE Trans. Parallel Distrib. Systems 4 (1993), 490–506. 16. Miller, R., Prasanna-Kumar, V. K., Reisis, D., and Stout, Q. Parallel computations on reconfigurable meshes. IEEE Trans. Comput. 42, (1993), 678–692. 17. Moshell, J. M., and Rothstein, J. Bus automata and immediate languages. Inform. and Control 40 (1979), 88–121. 18. Nisan, N., and Ta-Shma, A. Symmetric logspace is closed under complement. Proc. 27th ACM Symposium on Theory of Computing, 1995, pp. 140–146. 19. Olariu, S., Schwing, J. L., and Zhang, J. Fundamental algorithms on reconfigurable meshes. Proc. Annual Allerton Conference on Communication, Control and Computing, 1991, pp. 811–820. 20. Olariu, S., Schwing, J. L., and Zhang, J. On the power of two-dimensional processor arrays with reconfigurable bus systems. Parallel Process. Lett. 1 (1991), 29–34. 21. Sahni, S. Computing on reconfigurable bus architectures. In Balakrishnan et al. (Eds.). Computer Systems & Education. Tata McGraw– Hill, New Delhi, 1994, pp. 386–398. 22. Sahni, S. Data manipulation on the distributed memory bus computer. Parallel Process. Lett. 5 (1995), 3–14. 23. Shiloach, Y., and Vishkin, U. An O(log n) parallel connectivity algorithm. J. Algorithms 3 (1982), 57–67.
94
TRAHAN, VAIDYANATHAN, AND THIRUCHELVAN
24. Schuster, A. Dynamic reconfiguring networks for parallel computers: Algorithms and complexity bounds. Ph.D. Thesis, Dept. of Computer Science, Hebrew University, Israel, 1991. 25. Subbaraman, C. P., Trahan, J. L., and Vaidyanathan, R. List ranking and graph algorithms on the reconfigurable multiple bus machine. Proc. International Conference on Parallel Processing, 1993, Vol III, pp. 244–247. 26. Thiruchelvan, R. K., Trahan, J. L., and Vaidyanathan, R. On the power of segmenting and fusing buses. Proc. 7th International Parallel Processing Symposium, 1993, pp. 79–83. 27. Trahan, J. L., Vaidyanathan, R., and Subbaraman, C. P. Constant time graph and poset algorithms on the reconfigurable multiple bus machine. Proc. International Conference on Parallel Processing, 1994, Vol. III, pp. 214–217. 28. Trahan, J. L., Vaidyanathan, R., and Subbaraman, C. P. Constant time graph algorithms on the reconfigurable multiple bus machine. Technical Report EE-TV-96-03, Dept. of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, 1996. 29. Vaidyanathan, R. Sorting on PRAMs with reconfigurable buses. Inform. Process. Lett. 42 (1992), 203–208. 30. Vaidyanathan, R., Hartmann, C. R. P., and Varshney, P. K. PRAMs with variable word-size. Inform. Process. Lett. 42 (1992), 217–222. 31. Vaidyanathan, R., Hartmann, C. R. P., and Varshney, P. K. Parallel integer sorting using small operations. Acta Inform. 32 (1995), 79–92. 32. Vaidyanathan, R., and Trahan, J. L. Optimal simulation of multidimensional reconfigurable meshes by two dimensional reconfigurable meshes. Inform. Process. Lett. 47 (1993), 267–273. 33. Wang, B. F., Chen, G. H., and Lin, F. Constant time sorting on a processor array with a reconfigurable bus system. Inform. Process. Lett. 34, (1990), 187–192. 34. Wang, B. F., Chen, G. H., and Lin, F. Constant time algorithms for the transitive closure and some related graph problems on processor Received September 14, 1994; revised April 19, 1995; accepted December 1, 1995
arrays with reconfigurable bus systems. IEEE Trans. Parallel Distrib. Systems 1 (1990), 500–507. 35. Wang, B. F., and Chen, G. H. Two-dimensional processor array with a reconfigurable bus system is at least as powerful as CRCW model. Inform. Process. Lett. 36 (1990), 31–36. 36. Wang, B. F., Chen, G. H., and Li, H. Fast algorithms for some arithmetic and logic operations. Proc. National Computer Symposium, 1991, pp. 178–183.
JERRY L. TRAHAN received the B.S. degree from Louisiana State University in 1983 and the M.S. and Ph.D. degrees from the University of Illinois at Urbana–Champaign in 1986 and 1988, respectively. Since 1988, he has been with the Department of Electrical and Computer Engineering at Louisiana State University, Baton Rouge, where he is currently an associate professor. His research interests include models of parallel computation, theory of computation, and reliability evaluation of multiprocessor networks. RAMACHANDRAN VAIDYANATHAN received the B. Tech. (Hons.) and M. Tech. degrees from the Indian Institute of Technology, Kharagpur in 1983 and 1985, and the Ph.D. degree from Syracuse University in 1990. Since 1990, he has been an assistant professor in the Department of Electrical and Computer Engineering at Louisiana State University, Baton Rouge. His research interests include parallel algorithms, multiple bus networks, and reconfigurable bus-based architectures. RATNAPURI K. THIRUCHELVAN received the B.S. degree from Bharathiar University, Coimbatore, India in 1990 and the M.S. degree from Louisiana State University, Baton Rouge in 1992. Since 1992, he has been working as a software engineer at Advanced Paradigms, Inc. in Alexandria, VA. His research interests include parallel algorithms, multiple bus networks, and reconfigurable bus-based architectures.