North-Holland Microprocessing and Microprogramming 16 (1985) 101-106
101
Reconfiguration of VLSI Arrays: A Technique for Increased Flexibility and Reliability V.N. Doniants*, V.G. Lazarev**, M.G. Sami**, R. Stefanelli** * Institute for Information Transmission Problems, USSR Academy of Sciences Moscow, USSR ** Dept. of Electronics, Politecnico di Milano, Italy The problem afforded is that of dynamic reconfiguration of regular arrays of processing elements, implemented as VLSI devices; the aim is that of achieving high production yield and reliability, while keeping on-chip redundancy as low as possible and maintaining such figures of merit as design regularity and interconnection locality. The basic approach considered is that of a set of redundant "busses" controlled by a regular pattern of switches; such structure has been already proposed by other authors to achieve reconfiguration for error patterns belonging to a fairly large class, but previous proposals required a relevant redundancy in term~ of spare processing elements (for a rectangular array, redundancy increased with n , given an n*n array). We prove that use of global algorithms - rather than of purely local ones allows to achieve again survival to large classes of fault patterns, with much lower redundancy (increasing with the order of n). A structure with interconnection grids formed by sets of three busses along each direction, will be presented: it will be seen that two alternative algorithms - of increasing complexity and efficiency - can utilize it. A simple preliminary solution for the switch will be suggested. -
1.
INTRODUCTION
Ever increasing potentials of VLSI or even of WSI techniques has made it reasonable to consider complex multiprocessor architectures integrated onto one single device; this, in turn, has created a wide interest in the "flexibility" of basic multiprocessor structures, seen as the possibility of rearranging them with the aim either of obtaining different interconnection architectures or of reaching higher device reliability through reconfiguration after faults. In this light, particular attention has been given to processing array structures, that thanks to their extreme regularity and to the locality of the interconnection network - are well suited to VLSI or WSI implementation. Relevant examples of this class are constituted by "systolic arrays" [i, 2] and by the "CHIP" architecture [3]. Both have been widely discussed in the literature; it has been proved that a large class of interconnection networks can be mapped onto these regular arrays, making them suited to a vast class of applications. Fault-tolerance through use of spares and reconfiguration after faults has also been widely discussed for these same architectures [4, 5, 6, 7, 8, 9]; in general, the problem has been approached with the aim of achieving survival to a high number of faults while keeping the locality of intereonnections as high
This work was partially supported European Economic Community under CVT
by the Project
7
Fig. i CHiP architecture: as possible requirements
basic form
and limiting the added silicon for redundant interconnections.
It is obviously interesting to consider whether a basic, simple interconnection structure allows to achieve both types of flexibility. Here, we consider a rectangular array consisting of identical processing cells such as the CHIP; in its basic form, extensive use of switching allows to map a variety of architectures onto the array (fig.l). Hedlund [i0] suggested a redundant structure in which each bus is substituted by a pair of busses properly switched - allowing to reach a measure of fault-tolerance; his approach subdivides the array into subarrays (or "modules") of small dimensions - e.g. 2*3 or 3*3 - and it provides reconfiguration for any given pattern up to x faults inside each module with x spares (see fig.2). While on the one hand this gives high locality, the added interconnection length grows with x - a condition that may be unsatisfactory for a system with very stringent speed requirements. Moreover, the set of fault
102
V.N. Domants et aL / Reconfiguration of VLSI Arrays
Fig. 3 Processing array with augmented interconnection redundancy
Fig. 2 CHiP architecture:
redundant
form
patterns allowing reconfiguration is limited by the same locality of the approach; any pattern of x+l faults inside a module is - by definition cause of "fatal failure" (i.e. impossibility of survival). Finally, it can be noticed that r~dundancy required is actually of the order of n ; several authors (e.g. [11]) have argued that too high redundancy can ultimately limit the production yield in too severe a way. -
In the present paper, we refer to a rectangular array of combinatorial P.E.'s in which information flows along one direction only of each axis. We consider Hedlund's redundant interconnection structure as the basis for another, augmented interconnection network (making use of three busses properly switched) and we prove that quite satisfactory results can be achieved by using redundancy of the order of n, by adopting a totally different reconfiguration philosophy. Rather than recurring to direct substitution of faulty cells by spare ones, all such algorithms are based upon a "global reconfiguration" philosophy, by which "logical indices" (denoting the functions performed in the array by the working cells) are mapped onto the "physical indices" of the physical array proper. This allows to reach good figures of spares utilization and survival to faults while keeping very high locality of interconnections. In the next sections the redundant interconnection structure will be presented, and reconfiguration algorithms based upon use of very simple patterns of spares (a column and a row of processing elements added at the extreme edges of the array) will be discussed. General 2*2 bi-directional switches interposed onto the augmented set of interconneetion busses, allow to achieve reconfiguration even in presence of complex fault patterns. 2.
AUGMENTED
INTERCONNECTION
REDUNDANCY
Consider the structure of fig. 3, in which three busses are introduced between any two rows or columns of processing elements; at the intersection between the direct link connecting between any two P.E.'s and a bus, a 2*2 switch
is inserted (the features of the switch will be detailed in the sequel). The basic ("nominal") array is augmented by one column and one row of spare P.E.'s; two reconfiguration algorithms will now be discussed that - even with such limited redundancy - allow to reach good survival to faults and relevant spares utilization. The algorithms were presented in [12] with reference to a direct-interconnection type of structure and grouped under the "fault-stealing" definition; the resulting interconnection structure - if very fast in terms of propagation delay - required a fairly large amount of silicon area due to the relevant number of added links. As in [12], we discuss here only the interconnection structure proper and its involvement in the reconfiguration actions; circuits controlling the reconfiguration itself can be derived from the algorithms (or, alternatively, the algorithms can be implemented by firmware external to the array). Briefly, the "fault-stealing" concept may be defined as follows; basic reconfiguration due to presence of a faulty P.E. is performed along the rows of the array, one column of spares being added at the extreme right. While reconfiguration is then very easy whenever one fault only (at most) is present in any given row, presence of two or more faults requires "stealing" the possibility of reconfiguration from adjacent rows. The possible alternatives foreseen for "stolen" P.E.'s mark the different algorithms; as it can be easily understood, more complex algorithms - i.e., the ones that offer a wider set of possibilities - are also the ones that guarantee higher probability of survival. Any P.E. is identified by its physical position in the array ("physical indices" (i,j)) and by the functions it performs in the working array at operation time: these are indicated by the "logical indices" (i',j'). An unused P.E. has logical indices set to 0; if all P.E.'s are correctly working, logical indices will be set to 0 only for the spares, while they will be identical to physical indices for all other P.E.'s; in presence of faults, faulty P.E.'s will be associated with logical indices set to O, while for the working ones logical indices will in general differ from the physical ones. A reconfiguration algorithm operates a transform
103
VN. Domants et aL / Reconfiguration of VLSI Arrays
that - on the basis of the fault distribution generates the logical indices. In the simplest fault-stealing following technique is adopted: I.
any P.E. states:
(i.j)
may
be
in
algorithm, one
of
-
the three
- correctly operating, with i'=i - faulty (i'=j'=0) - "stolen" by an adjacent which case it is i'#i 2.
(upper) row, in
Rows are scanned for increasing values of index; assume that in row i (iks, get logical indices i'=i, j'=j-1. - P.E.'s (i,kl),...,(i,ks_ I) position of
(i+l,kl),...,(i+l,k associate
j'=kl,...,i'=i, -
3.
s 1 ),
now
-
Fig. 5 Simple Fault-Stealing algorithm, reconfiguration for multiple faults
"steal" the P.E.'s
with
which we
indices
i'=i,
j'=ks_l;
when row i+l is examined, P.E.'s (i+l,kl),...,(i+l,k 1 ) are considered , I ,, s- 1, , stolen and they act as if they were faulty;
the algorithm, not on its implementation, so that it must be possible to create them without any conflict even when switched busses are substituted for the direct interconnections. We prove now that, if the corresponding paths are created in the switched structure as shown in fig. 6, non conflict arises for use of a bus or of an interconnection link.
A "fatal failure" condition is reached if at least one P.E. (i+l,h), with h~{kl...ks_l} is faulty
I
I
I
horizontal input
I
(i-l,j-1),( i,j-1) ( i,j-2),(i+l,j-l) (i+1,j-2)
vertical input
( i,j-1),(i-l,j-1) (i-l, j),(i-l,j+l) (i-2, j),(i-2,j+l)
Table
[
ey "r
al
-[]
f
b
f
e
1 Fig. 4 Layout for Simple Fault-Stealing, fixed choice In [12], an interconnection structure making use of direct alternative links and of multiplexers was introduced; the basic layout is given in fig. 4, while a reconfiguration example is given in fig 5. In Table I, we give all possible alternative inputs to P.E. (i,j); obviously, they depend on
Fig. 6 Interconnection rules for fixed choice Fault-Stealing algorithm To this end, we prove that all possible locations of conflict for access to a bus or an interconnection link are actually excluded by the reconfiguration rules: horizontal interconnections to possible locations of conflict: i.
input
of
(i,~):
intereonnections a ((i-l,j-l)to (i,j)) and d ((i,j-1) to (i-l,j+l)): a implies that in row i-1 there is no downward shift involving column j-i; interconnection d implies a
V.N. Domants et aL / Reconfiguration of VLSI Arrays
104
downward shift in row i-l, column two conditions are obviously exclusive
j-1. The mutually
2.
((i-l,j-l) to (i,j)) and e ((i,j-l) to (i-l,j)): a implies a downward shift of i-i with respect to i in column j, not in column j-l; e implies the opposite. Again, the conditions are mutually exclusive
3.
c ((i,j-2) to (i,j)) and d ((i+1,j-1) to (i,j+l)): c arises because in row i, column j-1 there is only a rightward reconfiguration; d implies a reconfiguration downward only in the same position. Again, the two conditions are mutually exclusive.
a
O Fig. 8 Functions of interconnection
switches
The probability of survival, for a 20*20 array with one spare column and one spare row, is given by curve A in fig. 9. The most severe P
Possible conflicts on vertical interconnections:
1.
Interconnection pairs to be considered are b-d, b-e, b-f, c-d, c-e, d-e, d-f; by considerations of the same type as above, all are seen to be mutually exclusive. Therefore, reconfiguration can be actually implemented by adopting the interconnection paths shown in fig. 6. In fig. 7, the same example seen in fig. 5 is solved by means of the switched-bus intereonnection structure.
.8
6
A
.4
.2
10
15
20
25
310
3=5
410 n. of faults
Fig. 9 Probability of survival to faults vs. number of faults for a 20*20 array. Curves obtained by direct simulation. requirement in this algorithm is, in fact, that a P.E. requested for stealing must be correctly working while a fixed choice is performed for the P.E. that may invoke reconfiguration along the row; this limits the set of allowable fault patterns. Consider now the second algorithm proposed in [12], that allows to perform a "variable choice" for the P.E. invoking reconfiguration along the row: namely: i.
Fig. 7 Array reconfigured with fixed choice Fault-Stealing algorithm The scheme in fig. 6 allows to easily derive the control functions for the switches, as related to the distribution of faulty and stolen P.E.'s. If the only aim of reconfiguration is fault-tolerance, it is also possible to simplify in a very relevant way the set of switches; their functions (as shown in fig. 8, where the simple "crossing" function is implied at all intersections) are often simpler than the "complete" ones. The full set of complete switches may be useful if functional reconfiguration also is required.
rows are scanned for increasing value of index; assume that in row i (ik h get logical indices i'=i, j' =j-I - P.E.'s (i,kl)...(i,k~_ I) "steal" the positions o~ P.E.'s (i+l,k,)...(i+l,k h 1) , with which we assoezate now indices i'=i, j'=kl...i'=l, j'=kh_l; - P.E.'s (i,kh+l). ,ks ) "steal the position P.E.'s (i+l'k''1)'''(i+1'ks)'ntl with which we associate now indices i'=i,
"'(o~
J'=kh+l-i .... i'=1, j'=ks_ 1
V.N. Domants et al. /Reconfiguration of VLSl Arrays
2.
105
A "fatal failure" condition is reached whenever at least two faulty cells (i+l,h I) and (i+l,h2) , with hl,h2~{kl...ks } , are faulty
Reconfiguration is possible now for a far larger number of fault patterns; if we consider the curves representing probability of survival (curve B in fig. 9), we see immediately that performances are far more satisfactory than for the previous case. Now, the set of possible "input neighbors" to any given P.E. (i,j) is larger than in the previous case, and it can be derived from Table II: horizontal .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
input .
.
.
.
.
.
.
.
vertical .
(i-i ,j-l), (i-i, j-2) ( i, j-l) ,( i, j-2) (i+1,j-l),(i+l,j-2)
.
.
.
.
.
.
.
.
.
.
.
.
.
input .
.
.
.
.
( i, j-l),( i, j+1) (i-I ,j-l(, (i-1, j) (i-l,j+l),(i-2,j-l) (i-2, j),(i-2,j+l)
Table II
E P
_
Fig. ii Array reconfigured fault-steallng
with variable choice
b
L •
Z
~
~
(a
[--
-[] []
Fig. i0 Interconnection rules for variable choice Fault-Stealing algorithm Interconnections implementing all the above set of alternatives are shown in fig. i0. As before, it can be proved that no conflict for access to a bus or interconnection link can arise; additional alternatives to be considered are: - for the horizontal connections: between pairs b-c, or b-d, or b-e
conflicts
- for the vertical connections: inside pairs a-c, a-e, a-f, a-g, d-g, e-g.
conflicts b-g, c-g,
As before, all conflicts are solved by the mutual exclusion of the interconnections involved. An example of reconfiguration is given in fig. 11.
3.
IMPLEMENTATION REMARKS.
ASPECTS
AND
D[i]
D (b)
± T
CONCLUDING
Design of the switch is, obviously, a basic step towards design of the reconfigurable architecture here described. Design is at present afforded with regard to CMOS, two-metal layer technology; use of this approach allows to greatly simplify the layout whenever simple crossings are present (no explicit switching
Fig. 12 Scheme of a switch; each square in a) is a transmission gate as shown in b) device is then necessary). A preliminary electrical schematic for the complete switch (relative to one bit), making use of transmission gates, is given in fig. 12. A flrst-approximation evaluation require about 800 for an 8-bit switch; reduction to 8
106
V.N. Domants et aL / Reconfiguration of VLSl Arrays
switches for every single P.E. allows to compact their distribution around the P.E. in a satisfactory way. Further study is at present going on along two directions; the first one concerns optimization of a compact yet robust switch; the second one concerns analysis of array reconfigurability as regards functional reconfiguration (e.g. possibility of mapping various tree structures upon the array even in presence of faults 4.
REFERENCES
[i]
M.T. Kung, "Why systolic architectures?", Computer, vol. 15, n. I, pp. 37-46 (Jan. 1982)
[2]
K.S.Hedlund, L.Snyder, "Systolic arrays: a Wafer-Scale approach", Proc. ICCD 84, pp. 604-610, IEEE, New York (Oct. 1984)
[3]
L. Snyder, "Introduction to the configurable, highly parallel computer", IEEE Computer Magazine, vol. 15, n. i, pp. 47-56 (Jan.1982)
[4]
F.T.Leighton, C.E.Leiserson, "Wafer-scale integration of systolic arrays", 2n___dd symposium on foundations of Computer Science, IEEE (Oct. 1982)
[5]
A.L.Rosenberg, "The Diogenes approach to testable fault-tolerant VLSI processor arrays", IEEE-TC, vol. C-32, n. 10, pp. 902-910 (oct.1983)
[6]
H.T.Kung, M.S.Lam, "Fault-tolerance and two-level pipelining in VLSI systolic arrays", MIT conf. on Advanced Research in VLSI (Jan. 1984)
[7]
W.R.Moore, "A review of fault-tolerant techniques for the enhancement of integrated circuit yield", GEC Journal of research, vol. 2, n. I, pp. 1-15 (jan. 1984)
[8]
V.N.Doniants, S.lori, M.Pellegrino, E.I.Pi'il, R.Stefanelli, "Fault-Tolerant Reconfigurable Processing Arrays Using Bi-directional Switches", Microprocessing and Microprogramming, vol. 14, n. 3,4, pp. 109-116, North-Holland (Oct.-Nov.1984)
[9]
R.Negrini, M.G.Sami, R. Stefanelli, "Fault-Tolerance Approaches for VLSI/WSI Arrays", proceedings of Conference on Computers and Communication, IEEE, Phoenix (1985)
[i0]
K.S.Hedlund, L.Snyder, "Wafer-scale integration of configurable highly parallel (CHIP) processor", Proc. International Conference Parallel Processing, pp. 262-264, IEEE (1982)
[11]
R.M.Mangir, A.Avizienis, "Fault-tolerant design for VLSI: effect of interconnection requirements on yield improvement of VLSI design", IEEE Trans. Comp., vol. C31, n. 7, pp. 609-615 (July 1982)
[12]
M.G.Sami,
R.Stefanelli,
"Fault-stealing:
an approach to fault-tolerance of VLSI array structures " , Proc. ICCAS 85, IEEE, Beijing (June 1985)