Comparison between optical interconnection processors: folded perfect-shuffle versus three-dimensional butterfly K. M. IFTEKHARUDDIN,
K. JEMELI,
M. A. KARIM
The differences between folded perfect shuffle and three-dimensional (3-D) multi-channel butterfly optical interconnection architectures are quantified. The three-dimensional butterfly network, in particular, is used to implement Hartley transform operation. KEYWORDS: optical interconnects, folded three-dimensional butterfly interconnects
Introduction Signal and image processing, neural network, and machine vision systems often require massive parallelism and high throughput’. The present-day semiconductor technology, however, is often unable to offer the corresponding processing speed requirements. This suggests the use of optical technology for realizing parallel interconnections between the memory and the processing elements and among different computing modules2. The built-in non-interference characteristics of the optical interconnects enable the system to provide direct connections to the interior of a chip3. The optical interconnects are classified either as an index-guided (using waveguides) type or free-space (using free-space) type4. Free-space interconnection, in turn, is classified as either focused (obtained by means of lens, beam-splitter, etc) or unfocused (obtained by imaging light through holographic elements). The former, however, may either be space-variant or space-invariant. A number of efforts have already used free-space space-invariant optical interconnection processors for realizing logic devices and functionss-9. However, to date, little work has been done to quantify the inherent advantage of using three-dimensional (3-D) optical interconnects”. It is evident that 3-D optical interconnects may be used to input the entire two-dimensional (2-D) data in parallel whereas its third dimension may be used for The authors are in the Department of Electrical Center for Electra-Optics, University of Dayton, 45469-0227, USA
Engineering and Dayton, Ohio
0030-3992/94/04/0265-06 Optics & Laser Technology Vol 26 No 4 1994
perfect-shuffle
interconnects,
data propagation. However, how far we are better off using 3-D interconnection architectures instead of its 1-D counterpart has not been explored yet. Recent efforts have identified the implementation” and classification” of 3-D interconnects, but have not necessarily quantified the extent of their usefulness. In our current work, we compare two topologies, namely, folded perfect shuffle (PS)’ and butterfly’, as examples of 1-D and 3-D interconnection architectures respectively. It may be noted that, although both perfect-shuffle and butterfly are regular free-space networks, the latter offers a certain inherent advantage over the former since it needs no input magnification. Further, to emphasize the advantage of using a 3-D architecture, we implement here the Hartley transform operation using a 3-D butterfly. This may not be implemented using a 1-D folded PS.
Optical
interconnecton
One-dimensional
folded
processors PS processor
Because of its versatility”, the perfect shuffle (PS) is one of the most widely used optical interconnection networks. A PS interconnection is a space-invariant regular free-space network which consists of splitting a linear array of N = 2” points into two halves and then interleaving the two. The 1-D PS architecture”, for example, is capable of shuffling either rows or columns of a matrix whereas a 2-D optical PS can shuffle both the rows and columns of an image of a serial data-stream13. A folded PS network as shown in Fig. 1, has already been proposed and implemented optically’. Each
@ 1994 Butterworth-Heinemann
Ltd 265
Comparison
between
optical
interconnection
processors:
K. M. lftekharuddin
et al
Ahn)
B(m,n)
C(mn)
and magnify
k
Input Fig. 1
One step implementation
of a folded
PS (after
Ref. 7)
element of the input matrix is shuffled and rearranged as shown in the output matrix. A one-step optical implementation of the folded PS requires a mask (which records the image at the input for magnification and shifting) and four imaging lenses (which achieve magnification and shifting by overlapping the quadrants at the Output). The overlapped quadrants produce the folded PS output which, in turn, is a function of the recorded input pattern. A folded PS lends itself to the realization of a 1-D PS algorithm. Even though the folded PS has the advantage of providing global interconnections to some extent, it does not fully exploit the inherent 3-D nature of optics. Further, the input magnification requirement of all PS networks is itself a problem for the existing diffraction-limited devices8. Three-dimensional
butterfly
processor
A butterfly is a class of the log, N family of interconnection networks where N is the number of inputs. Such interconnection on a string of length N = 2’ acts as an exchange of the least significant bit (LSB) and the most significant bit (MSB) (i.e. between bit 0 and bit (r - 1)) in the binary address 4’ of each element b, of the string. In general, a butterfly comprises three phases such as a copy operation (for all elements of the string having a binary address with LSB = MSB), a shift to the right by (N/2) - 1 bits (for all elements of the string having a binary address with LSB = 1 and MSB = 0) and a shift to the left by (N/2) - 1 bits (for the rest of the elements)8,y. Butterflies can be used for obtaining the Boolean functions by generating and then combining the unminimized minterms. The optical implementation of a Butterfly interconnection processor is somewhat similar to that of the crossover network’. As an example of the 3-D topology, the butterfly processor is shown in Fig. 2 where A(m, n) is the input matrix, B(m,n) is the intermediate output matrix, and C(m, n) corresponds to the final output matrix. Processor
Fig. 2
A three-dimensional
interconnection
N = 64
3-D butterfly processor which are compared of a 1-D folded PS processor. Connectivity
with those
strength
In graph theory, directed graphs (or digraphs) are described as a finite non-empty set S of vertices together with a set A of ordered pairs k(s, y), known as directed edges or arcs, where .X and J are distinct vertices’4.‘“. The line joining x and 1; is an arc of the digraph while s and .r are said to be adjacent to each other. Figure 3 shows the digraph of a folded PS with vertex set S = (s,, .s2,. , .slh) and arc sets [(s,), (sz, sq), (Sj, S2), (sq, .s10), (sg, Sj), (se, .Sl,), (s,. Sq), (SH.s12), (sg, ss), (s10, S13)? (s, 1, s(j), (SIZ, s,4), (s,3, s,), (s14 , s, .J, (sl sr s8) and (s,~)). For the distinct vertices 'S , s,), an alternating sequence of vertices and &&?;s2;,.k;, s , , k,, , s,,k, is referred to as a path. The path-length is indicated by 11,the number of occurrences of arcs. If there exists a distinct path or walk (for the case of non-distinct vertices) between two vertices, the vertices are said to be reachable from one another. In a communication system, denoted by a digraph, the processing elements serve as the vertices while the communication channels (interconnections) act as the arc of the digraph. In a strong digraph, the pair of vertices needs to be mutually reachable. The strong component of a digraph is referred to as the maximal strongly connected digraph. In a broad sense, the connectivity strength q may be defined as the inverse of the number of strong components in a digraph. For the digraph of Fig. 3, for example, there are six strong components which provide a ~1of 16.16%. However,
comparison
In the following subsections, we provide a comparative anlaysis of the 1-D folded PS processor with the 3-D butterfly in terms of two parameters, namely, optimal connectivity and length of communication link. The final subsection discusses some of the features of the
Fig. 3
Directed
graph
for a folded
PS processor
Optics
266
with
& Laser Technology Vol 26 No 4 1994
Comparison
between
optical interconnection
processors:
K. M. lftekharuddin
et al.
according to Rent’s rule”, q must satisfy the condition 50% 5 4 _( 100%. In practice, Rent’s rule may be ignored but often at the expense of an undesirable system architecture 1’ . Accordingly, this particular folded PS processor is not only undesirable from a systems point of view but is also a weakly connected architecture from the communication point of view. This obviously suggests the possibility of having intense interconnections (arcs) between the nodes of the input and the output plane of the folded PS yielding a full 3-D processor. For any 3-D architecture, in general, each point (node) of the input plane is connected to those of the output plane, yielding a q value as high as 100%. Communication
link-length
Fast intra- and inter-processor communication depends largely on design efficiency of the interconnection topology. The design efficiency in turn is often characterized by the linear extent (i.e. length of the longest interconnection) of the system. In a guided communicaton system, the linear extent of the folded and 3-D architectures is given by 1 = k’/“N4’2JT) 1
(1)
and I, = kNqFi
(2)
respectively”. Here, k is a constant, N equals the number of interconnections, q is the connectivity strength ( = 1, for example, for neural network applications), i is the optical frequency (= 633 nm for a standard HeNe laser, for example) and F is a dimensionless quantity which may assume a value no less than unity. The two growth-rate equations may also be valid in the case of the free-space interconnection networks provided F is small (of the order of unity) (see Ref. 18). Figures 4 and 5 show the family of curves for linear extent versus 4 for different values of N. Note that for N = 64 and q = 1, the 3-D architecture yields a linear extent value about 3 times better than the folded network.
0
0.2
0.4
0.6
Fig. 5 Lrnear extent versus connectivity for three-dimensional interconnection networks for (a) 0, when N = 2; (b) +, when N = 4; (c) 0, when N = 8; (d) A, when N = 16; and (e) x, when N = 64
Input-output
relationships
Consider the particular 3-D butterfly processor of Fig. 6. The interconnection enclosed within the dotted box in this figure includes a butterfly interconnection such as that shown in Fig. 2. Next, a 2-D version of the 3-D butterfly processor is shown in Fig. 7. This shows the corresponding interactions between the elements of input matrix [A] and those of the intermediate and final matrices [B] (represented by stage 0) and [0] (represented by stage 1) respectively. The input-output relationships corresponding only to stage 0 of Fig. 2 (using the matrix notation of Fig. 8) are given by B(O, 0) +- A(O, O), JO, 2), A(2 O), A(2 2) B(l, 0) + 41,0),
41,2),
A(3, O), ‘4(3,2)
B(O, 1) + A(O, l), JO, 3), A(2, l), A(2, 3) B(l> 1) c A(I, I), A(1, 3), A(3, l), A(3, 3)
(3)
and so on. Note that the elements of the output matrix [B] form only one channel [Or,] suggesting the simultaneous existence of three other channels such as [Or,], [O,,] and [O,,]. Accordingly, the end-detector plane represented by matrix [0] (i.e. final output plane), is formed of four segments each resulting from
0.8
q
Fig. 4 Linear extent versus connectivity for two-dimensional interconnection networks for (a) [7, when N = 2; (b) +, when N = 4: (c) 3, when N = 8; (d) A, when N = 16; and (e) x, when N=64
Optics & Laser Technology Vol 26 No 4 1994
WI rnpulplane Fig. 6
A three-dimensional
butterfly
processor
267
Comparison
between
optical interconnection
processors:
K. M. lftekharuddin
et al.
a set of parallel channels corresponding to four sets of similar outputs from a single set of inputs. An optical implementation of the multi-channel arrangement (from the output plane represented by matrix [B] to the end-detector plane represented by matrix [O] is shown in Fig. 8. Figure 8 incorporates a butterfly such as that shown in Fig. 6, and illustrates a 3-D butterfly optical processor having four parallel channels. Note that the butterfly interconnection included in this configuration includes only stage 0 of Fig. 2; that is, up to the plane represented by matrix [B]. The split-shift operation for the butterfly interconnections as well as for the aforementioned multi-channel processor may be realized optically using simple optical components such as plane mirrors, lenses, quarter-wave plates, polarized beam-splitters, patterned mirrors and prismatic mirror arrays’.
Fig. 7 butterfly
The two-dimensional processor
equivalent
Consider the output of any one channel of the multi-channel 3-D butterfly processor. With [A] as the input matrix, the corresponding elements of a particular output matrix [Or r] (as obtained in (3)) is shown in Fig. 9. This output is nearly similar to that of the folded PS shown in Fig. 1. Note that for each of the quadrant of matrix [0, r], two diagonal elements have switched their places. fn order to overcome this drawback, the overall butterfly processor (which includes the stage 1; that is, up to plane C of the butterfly network of Fig. 2) may be considered. It is obvious from the 2-D version of this 3-D processor, as shown in Fig. 7, that the combinations of the input matrix elements yield 16 different output channels. The unwanted matrix elements may be masked out using an appropriate mask producing the same output as the folded PS network. However, this latter processor (which includes up to stage 1 of Fig. 2) involves more intensified interconnections than the previous one (which includes up to stage 0 of Fig. 2).
of a three-dimensional
(fk3)
The next section introduces a particular application of the 3-D butterfly processor, namely, the discrete Hartley transform (DHT). Note that the architecture of the 1-D folded PS processor is not particularly suitable for realizing DHT. This is because, as with DFT. the signal-flow graph of DHT is readily implementable using a bufferfly network architecture. On the other
(3.3)
Channel 4
End detector
Fig. 8
Multi-channel
plane (final output plane)
three-dimensional
butterfly
processor
Fig. 9 Input/output processor
matrices
of a three-dlmensional
Optics
268
butterfly
& Laser Technology Vol 26 No 4 1994
Comparison
between
optical
interconnection
K. M. lftekharuddin
processors:
et al.
hand, the implementation of DHT using PS necessitates the isomorphic conversion of PS (of any dimension) into a butterfly (of any dimension) first. This isomorphically converted butterfly may then be used to realize the DHT. However, because of inherent architectural differences between the PS (of any dimension) and the folded PS network, the above procedure of obtaining DHT using 1-D folded PS is not applicable.
Hartley
transform
implementation
The DHT was introduced by Bracewell” as a better alternative to both the discrete Fourier transform (DFT) and discrete cosine transform (DCT). The DHT has a real kernel {cos(27rkn/M) + sin(2rrknlM)) whereas the DFT uses a complex kernel given by {exp(i27cn/M)}. This makes DHT both simpler and faster than DFT since a multiplication of a complex variable (in DFT) involves four real products. The regular structure of the DHT signal flow graph offers simplicity for VLSI implementation. Further, because of isomorphism, the DFT and hence also DCT may be transformed into DHT easily. In the 1-D case, for a data sequence {x”; II = 0, 1,2,. . . , M - l), the corresponding DHT data sequence {y,; n = 0, 1, 2,. . . ) M - 1) is defined by
+ sin(2rckn/M)]
Fig. 10
-
GM
The two-dimensional
DHT
subsequences of the input algorithm is given byzo
butterfly
sequence)
when
= [;]
= &
[ ;;:;;;
-KT(M,Z)j[x,j
x, (7)
where y, is the preceding half while y, is the rear half of the output data, respectively, and x, and x, are the even and odd inputs in the bit reversed order*l. 7(M/2) is the rearranged DHT matrix which arises from the inputs of the bit reversed order; and the matrix K is given as cp&P
(4) where (Pi = 2;lrk/M, ‘diag’ stands for diagonal
for k = 0, 1,2, . . . , M - 1. Note that M is the order of the DHT. To implement the DHT as a butterfly, however, the 2-D DHT can be derived as
M = 2
1-D DHT
K T(M/2)
K(M/2) = diag(cos (P,J + diag(sin
M-1
x .sO x,[cos(2nkn/M)
Xl,1
. . .
. 0
(8) matrix and
1
10 y,, ,(M, x) = ;
M$’ “‘c’ x,,,[cos(2n(kn n-o n=O
+ Im)/M)
+ sin(2rc(kn + lm)/M)]
(9) (5)
for k = 0, 1,2,. . . , M - 1 and 1 = 0, 1,2,. , M - 1. To illustrate the 2-D DHT butterfly, let us assume that M is a power of 2 and try (5) for M = 2 as follows Yo.0 = x0.0 + x0.1 + x1.0 + x1.1 Yo.1
=
x0.0
-
x0.1
+
x1.0
-
x1.1
Yl.0
=
x0.0
+
x0.1
-
x1.0
-
x1.1
Yl,l
=
x0.0
-
x0.1
-
x1.0
-
x1.1
(6)
As far as the 1-D case is concerned, this transformation can be performed by means of a butterfly using an intermediate plane G as shown in Fig. 10. This allows for a better modularity in computing the higher-order DHTs. Accordingly, a 2-D DHT may be implemented using a 1-D DHT matrix of order N2. In the matrix form of the DHT formulation, the decimation-in-time (such that the overall DHT computation is decomposed into smaller and smaller
Optics & Laser Technology Vol 26 No 4 1994
. . .
0
Equations (7H9) provide a mechanism for generating higher-order 1-D DHT transforms from the lower order of 1-D DHTs. On the other hand, (5~(6) yield the 2-D DHT from the 1-D DHTs. For 2-D DHT, the input is simply written as 1-D DHTs by writing, for example, z,., = zp, with p = n + m where z can be either x or y. Then the precomputations of the necessary connections may be performed using the 1-D DHT algorithm of (7). For a given M, this allows for the determination of all the required connections between the three planes of the 3-D butterfly processor, such that the third dimension may be used for data propagation. To illustrate a 3-D butterfly of a higher order, consider for example, M = 4, which necessitates the 1-D DHT of order 16 (= 24). This 3-D butterfly (with the third dimension being used for data propagation) is shown in Fig. 11. Note that this particular 3-D butterfly uses a total of four 16th-order 1-D DHTs. However, due to the non-separability of the kernels in the DHT, the 2-D module (that is, the
269
Comparison
between
optical interconnection
processors:
K. M. lftekharuddin
et al.
References I
Louri, A. Throughput
enhancement for optical symbolic systems, Appi Opt. 29 (1990) 2979 Karm, M.A., Awwal, A.A.S. Opficd Compupurin~q: An Intiod~riorr. Wiley New York, (1992) Caulfield, H.J., Neff, J.A., Rhodes, W. T. Optical computing: the coming revolution in optical processing, f~crser F0cu.s. 19 (I 983) IO0 substitution
2 3
computing
4
Goodman, J.W., Leonberger, F.J., Kung, S.Y., Athale, R.A. Optical interconnections for VLSI systems, Proc, IEEE. 72 (1984) 850 s Murdocca, M.J., Huang, A., Jahns, J, Streibl, N. Optical dc\ign of programmable logic arrays, Appl Opt, 27 (19Xx) I651 6 Sawchuck, A. A., Jenkins, B.K., Ragharendra, C.S., Verma, A. Optical crossbar networks, IEEE Trams C’wn~ut 20 (I 9X7) 50 I Stirk, C.W., Athale, R.A., Haney, M.W. Folded perfect shut& processor. Appl Opt, 27 (1988) 202 x Iftekharuddin. K.M., Karim, M.A. Butterfly interconnection network: design of multiplier, Op/, 33 (1994) 1457
9
IO
l
Fig. 11
The two-dimensional
II
Represents addition DHT
butterfly
when
and shift register,
Cloonan, T.J., McCormick,
F.B. Photonic switching application5 of 2-D and 3-D crossover networks based on ?-input, 2-output switching nodes. App/ Opf, 30 (1991) 2309 Giglmayr, J. Classification scheme for 3-D shuflle lnterconnectioin patterns, Appl Opt. 28 (1989) 3 I20 Cloonan, T.J., Herron, M.J., Tooley, F.A.P., Richards, G.W., McCormic, F.B., Kerbis, E., Brubaker, J.L., Lentine, A.L. An
I6 17
31 (1982) 29 Ozaktas, H.M. Paradigm
12
Conclusions
13
The 3-D butterfly interconnection processor provides 100% connectivity between the input and output planes. For N = 64 and q = 1, the 3-D architecture yields about a three times better linear extent than does the 1-D folded PS network. Accordingly, the 3-D interconnection processor is preferable. The 2-D implementation of the DHT (which is computationally more efficient than the DFT and DCT) is just one example of the many applications of a 3-D butterfly processor. The architecture of a 1-D folded PS is not particularly suitable for the implementation of DHT. Thus, depending on the application area, a 3-D architecture may turn out to be the natural choice in photonics.
14 IS
and network. IX
I9 20 ?I
0~1
D7g,
of connectlvny 31 (1992~ IS63
for computer
tit-cults
Ozaktus,
H.M., Amitai, Y., Goodman, J.W. Comparison of system size for some optical architectures and the folded multi-facet architecture, O/U C‘o~tt/~rtrl. 82 (19Yl ) 115 Bracewell, R.N. The fast Hartley transform, f’rrjc, IEEE. 72 (1984) 1010 Hou, H.S. The fast Hartley algorithm. IEEE TJX~~.\(‘u/u/II,~. 36 (1987) I47 Oppenheim, A.V., Schafer, R.W. Di.scwtc,-Thw Siqutrl fwws.vi~~q. Prentice-Hall, New Jersey (1980).
Optics
270
,4/l/)/
all-optical implementation of a 3-D crossover network. IEEE f%oro,~ Tdi Ix/r. 2 (1990) 438 Lohmann, A., Stork, W., Stucke, G. Optxal implementation of perfect shuffle. In Twhicrd Diqrsr, Topiud hfwr/~y OH Opric trl Comprtli~~q, Optical Society of America. Washington. IX (19X5) Paper WA3. Lin, S.H., Krile T.F., Walkup, J.P‘. 7-U optical multlstasc interconnection networks. Ploc~ SPIE. 752 (19x7) 209 Harary, E‘. Grqh Thcor~~. Addison-Wesley, Reading. MA. (196Y) Gibson, P.M., and Caulfield, H.J. Applications of optical Boolean matrix operations to graph theory. A~/I/ O/I/. 30 ( 1981 1 .x9 I Feuer, M. Connectivity of random logic, IEEL T~trm (‘w~rpr,/.
M = 4
3-D butterfly of Fig. 11) may not be used to generate the higher-order transforms.
flip-flop
& Laser Technology Vol 26 No 4 1994