ELSEVIER
Information
Information Processiq Letters
Processing Letters 50 ( 1994) 2 17-222
Multiprocessor interconnection network using pairwise balanced combinatorial designs Simon Y. Berkovich
*
Department ofElectrical Engineering and Computer Science, The George Washington University, Washington, DC 20052, USA Communicated
by D. Gries; received 14 December
1993
Abstract This paper presents a multiprocessor interconnection structure similar to bus interconnection networks using a combinatorial arrangement with pairwise balanced design property. By reversing the roles of processing and connecting elements in such an arrangement, the suggested structure allows to directly support object-oriented constructs in distributed fault-tolerant systems. Key words: Distributed
systems; Fault tolerance; Combinatorial
1. Introduction The interconnection mechanisms in multiprocessor systems, using either message passing or shared memory communications, are actually based on the same operational principle: shipping a data item from a node where it is stored to a node where it is needed for processing. In this paper, a different approach to the organization of interconnections in multiprocessor computer systems is suggested: information exchange is performed directly through local interactions of replicated data while the updates are incorporated in follow-up broadcastings. This approach specifically supports the object-oriented view of the world assuming that * On sabbatical leave with AlliedSignal Microelectronics and Technology Center, Columbia, MD 2 1405, USA. Email:
[email protected].
designs; Interconnection
networks
the evolution of a system can be described by a sequence of information exchanges between various pairs of objects. In the implementation of the object-oriented paradigm there is no semantic difference between invoking an operation on a remote object and invoking it locally, however, there is a huge difference in overhead, so the performance may differ by a factor of thousand; this complication can be overcome if the objects are brought together by different schemes of replication and migration [ 11, The suggested interconnection network can ensure exclusively local interactions of objects due to appropriate handling of their replicated copies in a combinatorial arrangement having pairwise balanced design (PBD) property. The technique of combinatorial designs is very useful for the development of algorithmic and architectural constructions in different areas of
0020-0190/94/$07.00 @ 1994 Elsevier Science B.V. All rights reserved XSDZOO20-0190(94)00028-W
S. Y. Berkovich /Information BI x2
1
x3
1
x4
0
xg x6
0 0
x7
0
x8
0
x9 x,
I
01
B2
B3
B4
BS
Processing Letters SO (1994) 217-222 Be5
B7
&
0 0 0 1 0 0 0000100100 1 0 1 000010 1 0 0 1 0 1 1 0001010100 0 1 1 0001000 0 1 0 1 000110 01 00 01 00 01 11
B9
BIO
B11
B12
1
0
0
1
0 1
0
0
0
10 0
1
01
1 0 00 1
00
00
01
Fig. 1. An example of an incidence matrix for a BIB design: ( 12,9,4,3,1).
computer science [2]. A combinatorial design for a given set of elements represents a collection of subsets created according to certain specifications; it can be depicted as an incidence matrix, as shown, for example, in Fig. 1 (see, e.g. [ 5 ] ). The PBD property characterizes a combinatorial design in which for any pair of elements, xi and Xi, there exits a common subset, a block B,, to which both elements belong simultaneously. The PBD property can be provided by a uniform combinatorial arrangement called balanced incomplete block (BIB) design. A BIB design of a set of 21elements is a collection of b k-subsets (blocks) such that each element appears in exactly Yblocks and every two elements appear simultaneously in ;1blocks. The BIB design is usually represented by its parameters in standard notation: (b, vu,r, k, A). Thus, Fig. 1 presents a BIB design: ( 12,9,4,3,1). According to the PBD property, for any pair of elements, say x3 and x8, there is exactly one block where they are placed together, in this case Bg. The combinatorial designs with the PBD property can be applied to the construction of bus interconnection networks [ 2,6]. In such a network, a set of processing elements is interconnected by a collection of buses corresponding to the subsets in the combinatorial design. By virtue of the PBD property this network can provide a direct link between any pair of processing elements. The suggested multiprocessor interconnection network uses a similar PBD combinatorial arrangement but in a reverse sense: the elements of the original set represent interconnection links (like buses) while the subsets of these elements represent the processing nodes. An object stored in the system is associated
with a certain link and is replicated in each of the nodes which this link connects. Because of the PBD property for any pair of objects there exists a processing node where copies of these objects reside together. An act of interaction of two objects occurs locally between their copies within this processing node. After the interaction the updates can be immediately sent to all copies of the objects through separate interconnecting links. In the suggested structure all the copies of the interacting objects are updated simultaneously. Conventional computer systems using replicated data normally have these data in different versions; to maintain coherence these systems employ a lengthy procedure of keeping track of time stamps of the copies (see, e.g. [ 41).
2. Organization of information exchange Let us illustrate the suggested concept of organization of multiprocessor interconnection networks with a simple BIB design: (7,7,3,3,1) which represents the so-called Steiner’s triples (Fig. 2). Conventional usage of this BIB structure can directly provide full interconnection facilities. For example, messages from node 2 can be sent to nodes 6 and 7 through the left communication link, to nodes 1 and 4 through the middle communication link, and to nodes 3 and 5 through the right communication link. As soon as messages from different nodes are sent simultaneously, this structure encounters typical communication networks problems of concurrency control. These kind of problems can be avoided using the suggested modification where the roles of processing elements and communication links
S. Y. Berkovich /Information Processing Letters 50 (1994) 217-222
1
2
3
4
5
6
7
7
1
2
3
4
5
6
5
6
7
1
2
3
4
219
replicated objects
processing nodes
communication links
Fig. 2. A PBD network corresponding
are reversed. With this novel organization of operations PBD interconnection networks acquire improved performance characteristics in terms of communication latency and fault-tolerance. Demonstration of the distinctive features of the suggested construction by the example in Fig. 2 is a little bit confusing because the presented BIB design is symmetric with respect to its elements and their subsets. However, it is benelicial to stay with this example to escape clutter in the pictorial representation of more profuse interconnections. If we depict a more complicated network according to the non-symmetric BIB design presented in Fig. 1, it would have 12 processing nodes connected by 9 communication links, each communication link would include 4 nodes and each node would belong to 3 links. With this pattern in mind, it is easier to follow the proposed mechanism using a simpler example in Fig. 2. In this example, the seven processing elements are distributed in seven 3-element subsets and every node is included in three subsets. Thus, every object is stored in three copies. Copies of the object #l are placed in the processing nodes 1, 2, and 4, copies of the object #2 are placed in the processing nodes 2, 3, and 5 and so on. Assume that there should be an interaction between two objects, say #2 and #7. The structure
to Steiner’s triplet: (7,7,3,3,1).
of the system allows this operation to be executed in the node where the copies of these objects are located together without moving the data across the network. In the given example this will occur in processing node 3. After this operation, the copies of altered objects can be updated through two separate links corresponding to objects #2 and #7. The required updates can be sent simultaneously through the link corresponding to object #2 - to the nodes 2 and 5 and through the link corresponding to object #7 - to the nodes 1 and 7. Interactions of pairs of objects which have both different components can progress concurrently.
3. Evaluation of the parameters The standard notation of BIB designs (b, V, r, k, A) is applied for the suggested interconnection network with the following meaning: b- the number of processing elements, nodes; v- the number of communication links, buses, objects; r- the number of processing elements connected by a communication link, k- the number of node connections and the number of objects within a node; A- the number of times the pairs are replicated.
220
S. Y. Berkovich /Information
Processing Letters 50 (1994) 217-222
The five parameters in a BIB design are not independent since they have to satisfy the following relationships (see, e.g. [ 5 ] ): b.k
= v-r,
r.(k-1)
(1)
=d+(v-1),
b a v.
(2) (3)
In a realization of a BIB design we could choose three parameters, the remaining two will be determined from ( 1) and (2) subject to the constraint of Fisher’s inequality (3 ). Consider the following parameters as given: v - the number of objects which is determined by the problem, k - the number of connections in a processing node which is a fixed number determined by hardware implementation, and A - a replication parameter related primarily to faulttolerance issues. The parameters which can be adjusted are: b - the total number of processing nodes and r - the number of nodes attached to a bus. The choice of these parameters is determined by the following formulas: r= b=
A*(V-1) (k-l)
b M 116. v2,
(6)
r z 1/2.v.
(7)
Thus, a system of 20 objects would require a PBD network with more than about 60 processing nodes. In this network, the number of nodes attached to the interconnecting buses should be about 10.
(4)
’
4. Application issues
J.v.(v-1) k.(k-1)
of the system. For ;1 > 1 the system has a substantial redundancy implying rich facilities for fault-tolerant operations. Because of the replication of objects some kind of these facilities also remain for A = 1. The number of connections to a processing node, k, is a fixed parameter established by the construction of the node. To simplify the construction of a processing node this parameter should be kept relatively small. So, let us choose the minimal values J = 1 and k = 3. Then, for the construction of the PBD network we can get estimates for the number of processing nodes, b, and the size of the subsets interconnected by a bus, r, as a function of the number of objects, v:
’
(5)
Practical realization of the proposed multiprocessor interconnection network does not necessarily require a perfectly uniform construction of a BIB design as long as the PBD property is provided. While the BIB designs may exist only for certain well determined combinations of the parameters, the construction of not exactly balanced combinatorial designs with the PBD property is less restrictive [ 3 1. Deviation from an exact balance does not impair the functioning of an interconnection network but determines only the effectiveness of hardware utilization. The above considered relationships (4) and (5) are also valid for such combinatorial designs with the PBD property in an approximate sense. As can be seen from (4 ) and ( 5 ) the hardware complexity of the construction as a whole is directly proportional ;2. The choice of the parameter 1 determines the fault-tolerance capabilities
The presented analysis shows that the realization of large PBD networks is not economical. This result is not surprising since the interconnection facilities of the PBD network are similar to those of a fully connected network and a full crossbar switch. Therefore, the PBD network, as well as a fully connected network and a full crossbar switch, can be directly applied only for the construction of relatively small systems. The specific feature of the PBD network is the redundancy in data representation. This gives the advantages to the PBD network which are determined basically by two factors: ( 1) intrinsic reliability with automatic fault-tolerance and increased availability, (2 ) convenient control of concurrently interacting objects. The practical implementation of the PBD networks raises a number of questions. By the way the system functions, it must have at least the same the number of interconnected subsets (or
S. Y. Berkovich /Information
Processing Letters 50 (1994) 217-222
buses) as the number of objects. Since creating large PBD networks is undesirable, situation when the number of objects exceeds the number of buses should be resolved by other methods. When the graph of objects interrelations is sparse, there is no need to have a PBD network providing the possibilities for all pairs interactions. Saving hardware in this case would require, however, some extra efforts for mapping the object interaction graph on the configuration of available pairwise connections. A natural way to match the number of objects with the number of interconnection links is to use the aggregation of objects. In this case, objects can be combined into a required number of aggregates fitting the available configuration of a PBD network. Interactions of objects in different aggregates can be handled exactly in the same manner as described for single objects. Interactions of objects inside an aggregate can be performed within any of the processing nodes containing a copy of this aggregate; the other copies can be updated immediately since all the nodes containing these copies are connected. The PBD network can speedup operations in a multiprocessor system due to simultaneous execution of interactions of objects pairs with distinct components. In the case of k = 3 each processing node contains three different objects and, hence, at a certain moment it can be assigned to process no more than one pair with distinct components. A collection of w objects may have Lv/2J pairs with all different components. Therefore, if the structure of an algorithm permits, a multiprocessor system using the PBD network would be able to handle any assignment of such pairs, and maximum speedup which can be achieved in this system will be [v/2]. Thus, in the example presented in Fig. 2 the maximum speedup will be 3. The application of the suggested approach to computational problems of greater size requires expanding the interconnection structure retaining its PBD property. Because a straightforward expansion of this system is complicated it might be beneficial to develop a large PBD network as a hierarchical structure from relatively small arrangements, like Steiner’s triplet of 7 nodes. This
221
combinatorial arrangement is ideally balanced and in accordance with (3 ) contains a minimal number of processing nodes.
5. Conclusion The paper presents an organization of interaction of processing elements in multiprocessor systems modifying the approach of bus interconnection networks based on combinatorial designs. The suggested structure organizes interaction of different objects through their replicated copies providing their direct contact by an interconnection network assembled in accordance with a combinatorial arrangement having pairwise balanced design (PBD) property. The coupling facilities and the communication performance of this PBD network are comparable with those of a fully connected network and a full crossbar switch. For application in large multiprocessor systems the suggested PBD network can be developed in a hierarchical structure. Besides its communication effectiveness the presented construction offers convenient solution to certain complications arising in the organization of computations in a distributed environment. The PBD network can facilitate the realization of a fault-tolerant mode of operations and resolution of conflicts under a decentralized control. The important quality of the suggested multiprocessor interconnection network is that it can support software constructs compatible with the object-oriented programming.
Acknowledgement The author would like to acknowledge the valuable discussions with D.T. Peng and J. Zhou regarding the application of the presented concept for the development of distributed faulttolerant systems. References [ 1) B. Achauer,
The DOWL distributed object-oriented language, Comm. ACM 36 (1993 )48-55.