Fault-tolerant hexagonal arithmetic array processors

Fault-tolerant hexagonal arithmetic array processors

North-Holland Microprocessing and Microprogramming 24 (1988) 629-636 FAULT-TOLERANT HEXAGONAL 629 ARITHMETIC ARRAY PROCESSORS Vincen~.o P I U R ...

659KB Sizes 0 Downloads 109 Views

North-Holland Microprocessing and Microprogramming 24 (1988) 629-636

FAULT-TOLERANT

HEXAGONAL

629

ARITHMETIC

ARRAY PROCESSORS

Vincen~.o P I U R I Department of Electronics, Politecnico di Milano Piazza L. da Vinci 32, I20133 Milano, Italy

ABSTRACT On-fine error detection in hexagonal arithmetic array processors is discussed. Two approaches are presented: the first one is based upon residue arithmetic, the second one upon A*N+B codes. New methodologies to design hexagonal array processors with high error detection capability, low silicon area consumption and low computational overhead are proposed and evaluated. Both bit-serial and bit-parallel implementations are considered.

1. I N T R O D U C T I O N Massive computations in digital signal/image processing and in matrix operations often need special-purpose architectures, like array processors. They consist of a number (usually very high) of identical processing elements connected by a regular interconnection grid [11 [2]. When critical applications are concerned, their reliability and availability become basic requirements. Fault-tolerance techniques allow for obtaining these features. Error detection and, possibly, its correction must be followed by fault localization and by array reconfiguration to bypass the faulty processing element. In this paper on-line error detection and concurrent fault localization are considered in hexagonal arithmetic array processors: these are in fact the most critical steps to achieve high system agailability by minimizing the recovery time. Two main methodologies may be found in literature to detect errors: the first approach is redundancy (either physical or time redundancy) [3] [41 [Sl [0],the second one is coding [6] [7] [81 [9] [10]. In physical redundancy the same computation is performed by different processing elements and their outputs are compared, while in time redundancy the same computation is performed at different time in the same processing element or in different ones. In coding, input data are substituted by their coded representation; computation is performed on such data and outputs are checked by custom circuits that verify the congruence with a proper output code.

view of the possibilities of each approach. The fault and the error models of circuits proposed here are discussed. The basic concepts of the error-detecting techniques here adopted are introduced: A*N+B codes and residue arithmetic are briefly presented. Their features are evaluated and some criteria are given to identify the best codes to protect a given number of bits by minimizing the silicon area and the computation overhead. Architectural details are then presented: computational delay, silicon area and error latency are estimated. Whenever an error is detected, the computation of the array processor is unreliable until the faulty processing element is not identified and excluded from the computation by reconfiguring the interconnections [5]. Some architectural supports are proposed to achieve a fast on-line localization.

2. H E X A G O N A L A R I T H M E T I C PROCESSORS

An array is composed by a set of simple processing elements (PEa) connected by a proper network [1] [2]: the characteristics of these architectures and the algorithms executed depend on the structure of PEs and the kind of interconnection network. In this paper we consider a regular interconnection mesh which allows to design hexagonal arithmetic array processors for a wide class of applications in digital signal/image processing and matrix operations (fig. 1).

In this paper two approaches are proposed and evaluated: the first one is a pure coding technique based upon A*N+B codes, the second one is a mixed approach based upon residue arithmetic and modular redundancy. The class of array architectures and the structure of the processing elements are presented. Both bit-serial and bitparallel implementations are considered to give an overall

ARRAy

Fig. 1 - H c z a g o n a l array processor

V. Piuri / Fault-Tolerant HexagonalArithmetic Array Processors

630

Details of the interconnection network are not drawn because its characteristics and topology depend on the algorithm implemented and on the reconfignration capabilities that the designer introduces into the structure. Anyway, the error-detection techniques discussed here are not affected by the adopted solution: only the reconfiguration changes according with the interconnection network. From the literature it is possible to extract a simple common structure of the PEs for the class of applications considered here (fig. 2) [1] [2]. Usually three inputs are supplied to the PE by the surrounding environment: two are data coming from digital devices connected to array borders (a~, and b,, ), the third one is the result of the computation performed by the adjacent PE (e,,). Three outputs are produced by each PE: two (ao~, and bo.,) are equal to the corresponding inputs, while the third one (tout) is the linear combination of the product of the inputs ~ . and b~. and the result of the computation executed in the adjacent PE. The basic computation performed in the array is therefore a linear weighted combination of data coming from digital devices connected to the array inputs. Often along the data path some shift registers are introduced to syncronize properly the data flow according with the target algorithm. On the other hand, the direction in which the data flow may be changed for some classes of algorithms. In both these cases the results presented here still hold.

~ov¢ 4"=-~ln

L

~

bo,.*~-b,,. Co,,t ~-" ¢~, "~ ~.. * bin Fig. • - Processing element

This simple structure, or other structures derived from it, may be used with a suitable interconnection network to execute a wide set of algorithms for different applications characterized by regularity (i.e. repetitive computations on large set of data) and high computing [I] [2]. For example, in digital signal and image processing it is possible to perform FIR and IIR filtering, 1-D or 2-D convolution, signal correlation, discrete Fourier transform, and other more complex operations, while in matrix arithmetic it is possible to compute matrixvector multiplication, matrix-matrix multiplication, matrix triangnlarlzation, solution of triangular linear systems. The aritmetic units used in this basic PE may be bitparallel or bit-serial, according with the implementation criteria adopted by the designer. Bit-parallel units may achieve the best performance in each processing unit, when a careful design is adopted. Bit-serial units must be preferred to minimize the silicon area required by the PE and, thus, to implement large array processors on a single chip or on a wafer. Besides, it was shown [II] that a higher throughput of the whole system can sometimes be achieved by integrating

bit-serial units on the same chip or wafer insted of bit-parallel units. In fact, by laying a great number of bit-serial PEs on the same silicon support, it is possible to decrease the number of interconnection wires between PEs on different supports and to reduce the delays due to interconnections. In this research we have adopted the classic fault models at gate level [I0] both for permanent, transient and intermittent faults; in particular, we consider stuck-at faults, stuck-on, stuck-open and bridging faults. Related to such fault model, we consider a classic error model at gate level [i0] [12]: the physical defects due to faults appear, if they are not masked by the actual input data, as some wrong bits in the result of the considered unit. The actual result is the sum of the correct result plus the error. 3. R E S I D U E A R I T H M E T I C In this paper an extension of the method proposed in [13] is presented for hexagonal arithmetic arrays. The computation is executed by a binar v array processor, while a set of additional array processors (called residue arrays) perform the same computation in a suited Residue Number System (RNS) [14] concurrently with the binary array. The binary array and its PEa are those presented in the previous section. The architecture of each residue array is the same of the binary array; while residue PEa are similar to binary ones, but they execute all operations in the adopted RNS. Different techniques may be adopted to design residue arithmetic units and the whole residue PE, as proposed in [13]. Computation time for unchecked results (i.e. the output of binary array) is not increased by redundancy, even if correctness is certified only after a given time. To detect errors, results of binary and residue arrays are compared in the RNS, thus avoiding the high-computing residue-to-blnary conversion. If an error E arises in binary or residue arrays, the actual result P ' is P ' = P + E = P ÷ ~-~:__-Jk~2', where k~ E { - 1 , 0 , 1 } . Error occurrency can be detected by examining E = P ' - P. When a set of bases B = {bk [ k = 1,...,K} is considered, the K-tuple of the residues of such difference is called the syndrome s of the error: s = {a, [ a , = R[P' - P]b,,k = 1 ..... K } . If the syndrome is null, i.e. each residue is null, no error is detected in the computation of the arrays. Nevertheless, this does not always imply that no fault has occurred: in fact faults may be masked by data that generate a null binary result. The choice of the set B of bases of the RNS constitutes an important step in the design of fault-tolerant array processor to minimize the computation delay introduced in the basic architecture and the additional silicon area required by physical redundancy. Some values of the bases b, of the set B are forbidden: they cannot be powers of two, otherwise many errors cannot be detected since their syndromes may be null [14].

V. Piuri / Fault-Tolerant HexagonalArithmetic Array Processors If only one base b is used, two opposite trends must be balanced: the possibility of detecting errors and the minimization of the silicon area. This last goal can be achieved by reducing the number of bits used to represent the residue of the data. Nevetbeless, such choice implies the reduction of the range covered by the RNS since the range is equal to b. This greatly reduces the detection probability because more binary numbers have null syndrome. Better result may be achieved by using two bases. In this case the silicon area could be greater than the area used by only one base, but a careful choice of the RNS allows to design small circuits with a high detection probability. To achieve high detection probability, the whole data range must be covered by the range M, which is given by [14] M : l.c.m.(B) : l.c.m.{b. I b. c B} (l.c.m. is the least common multiple). Therefore, to fully use the redundant silicon area, the bases have not to waste their capabilities of enlarging the range M. This goal can be achieved whether the bases are prime between themselves [15], i.e. if it is G.C.D.(B) = G.C.D.{b. I b. e B} = 1 (G.C.D. is the greatest common divisor). At the same redundancy, i.e. the same number of bits for coding each base, the best results are obtained by choosing the largest values for the bases which satisfy the previous constraint. From extensive experiments, it is possible to show that, generally, a third base is not useful to detect errors in array processors. A further extension of the range by means of a third base might in fact be adopted only for very large (> 250) number of bits in the binary units.

4. A * N + B C O D E S A*N+B codes constitute an important class of non-separate codes [8] [g], i.e. codes where information bits are not separated from check bits. Arithmetic unit does not manipulate the input numbers directly. Each input N is modified via the linear transformation A*N+B and then delivered to the arithmetic unit which executes the nominal operations in the transformed data space. The output R of such operations may be moved back to the target data space via the linear inverse transformation. The functions computed by arithmetic units on coded data and the structure of such circuits are the same of the units working on the uncoded data. The only difference consists in the number of treated bits: [log2N'] bits are used for uncoded data Iwhere N" is the maximum input number), while [log2A,N" + B ] bits are required by coded data. Therefore, computation time is incresed by the additional data bits and by data conversions. Error detection consists in verifying that the results belong to the proper code, i.e. that the result minus the second constant B of the output code is divisible by the first constant A of the output code.

631

Some problems arise in balancing costs (silicon area and computation time) and benefits (detection probability) for a pair of coefficients (A,B), but not for some classes. Large silicon area may be required by circuits which modify data to connect PEs. For example, the two-inputs adder in each PE must be substituted by a larger adder whose inputs are the nominal ones and the constant - B . Suited circuits must be connected to the multiplier of the PE to subtract the first order term B(a + b) from the multiplier output: this allows for producing data in a C*N+D code and, thus, for connecting the multiplier to the PE's adder and to adjacent PE. On the other hand, extensive experiments have ahown that acceptable detection probabilities are obtained only by coding all PE inputs. For example, additional data conversions and manipulations previously described are not required by coding only one input stream, but detection capability is about 20-30% lower. To fully exploit the error detection capabilities of A*N+B code and to minimize the requirements of such technique (computation time and silicon area), a careful choice of the coefficients A and B is mandatory. Let's first consider A*N codes, i.e. the coefficient B is assumed null: this is the simplest class which allows to save some silicon area and computation time with respect to the more general class of A*N+B codes. In A*N codes different error detection capabilities may be achieved by choosing the coefficient A. A class of such codes considers A = 2 ' ; it has been shown [8] that it does not achieve an interesting error detection. In fact the coded data may be obtained simply by shifting the input N for k bits on the left: correctness check degenerates in verifying that the k least significant bits of the encoded number are null. Such condition does not ensure the proper coverage for all errors that influence any bits but the first k bits, which do not contain useful information. The following criterion may be adopted to perform an optimal choice for an A*N code with respect to error detection and silicon area. If A and N are represented by means of k and n bits respectively, the coded representation A*N of the input N requires n + k bits at most. Additional silicon area and computation time are highly used and conversion algorithms can be easily and effectively implemented for A = 2 ' - 1, i.e. the maximum value that can be represented in k bits [15]. Let's note that silicon area and computation time are completely used for [2"+'--11 A = L-~-2T j where [.] is the integer part. Nevertheless, no general conversion algorithm and small circuit were found. If the input data are represented by means of a number of different bit sequences which is less than the number of

632

V. Piuri / Fault-Tolerant HexagonalArithmetic Array Processors

possible bit configurations allowed in n bits, the optimization process might be performed in different ways. By considering the actual values of the maximum input number it is possible to reduce the number of bits required by the coded representation. Nevertheless, no simple and general relationship was found between the maximum input number and the coefficient A to maximize the error detection capabilities and to minimizing the silicon area and the computation time. Let's consider now the more general A * N + B codes. This class of codes can be used to minimize the fault latency in arithmetic units. In fact, if many inputs of a P E are null, some errors may not be detected and, thus, fault latency may increase dangerously. Besides, if faults are not detected as early as possible and other faults arise, detection capabilities might be decreased by the joint interaction of the effects of faults on the computation. To reduce the fault latency it is necessary to avoid the presence of null codeword. In the A * N + B code a non-null coefficient B allows for this goal. Nevertheless, as already said, some problems about connection capability arise in the implementation of arithmetic units because operations can produce outputs in a code different from the input one. Addition can be easily managed by modifying the adder: it must be substituted by a circuit that add the nominal inputs and the constant - B . Multiplication is more difficult; in fact, the product P = ( A z + B ) ( A y + B ) = A 2 (zy) + A B ( z + y) + B ~ cannot be transformed in a coded representation of product zy, like C*xy+D, without subtracting the first order term (: -i- y). Nevertheless, it is possible to identify some classes of A * N + B codes in which the code representation of the multiplier output behaves as a C * N + D code. Let's consider the finite field in which operations on n-bits numbers are performed: if we assume that no overflow occurs during computation, the usual arithmetic operations performed in such field may be viewed as operations in a base b" of a residue number system [14]. In particular, the output of the multiplication is in a C * N + D code when the inputs are in a A*N-t-B code if

For s > 0, a pair (A,B) can be chosen so that A is not a power of two. The error detection capabilities may be exploited with minimum silicon area and computation time when the value of b is maximum, i.e. when b -- 2~ - 1. In tab. I some pairs of values for A and B are given when different numbers of bits a r e assumed. c s A

B

c s

A

B

c

s

A

B

413

5

73

5

25

9

2

3

170

4 1 5

3

7 7 11

11

9

5

13

39

5 2 3 10

8 1 3

85

9

6

11

46

81

5

51

10

1

3

341

6 1 3 21

525

6

8 1 15

17

10

1

11

93

6 1 7

9

8!3

11

23

10

2

7

146

619

7

84

7

36

10

4

5

2O4

6 4 5 12

8 7 13

19

10

4

1 7 60

7 2 3 42

9 1 7

73

10

4

15

68

10

7

9

113

10 10 13

78

7

9 14

9 2 15

34

Tab. 1 - ( A , B ) pairs for b° = T - s

When different values of the pair (A,B) allow to protect the same number of bits, some criteria should be defined to perform the best choice. But a simple rule cannot be identified since conflicting cost functions are usually considered. High value of coefficient A allows to achieve high error detection capabilities since the codeword are far one from the other, but only a reduced range of input data can be coded with the given number of bits. On the other hand, a greater number of bits requires greater silicon area and high computation time. Therefore, the choice of the code must consider the target application of the array processor and, thus, the available silicon area and the actual timing constraints for the computation. For the optimum choice of code according with the previous criteria, some small fast encoding and decoding circuits may be designed, for example as it is shown in [15].

P~. [A"(~v) + AB(= + y) + B'] = P~. [c=~ + D] 5. C O N C U R R E N T that is if P~. [ A B ] = 0. Such condition is satisfied, for A and B different from zero, if A B -- b*. From this property some A * N + B codes may be identified by choosing the value of the base b*. For example, if e bits are used for the codeword, the greatest base b in which residues are represented in c bits is 2% In this case the coefficient A is equal to a power of two since it must be different from 1: but, as already said, it is not a value suited for error detection. A second class of A * N + B codes may be defined if we accept eventually to discard some codeword, i.e. to reduce the range of input data. Let's consider a base b* = 2c - s.

ERROR

DETECTION

After presentation of the overall coding techniques, in this section architectures for their implementation are proposed and evaluated. Two approaches may be adopted in designing array architectures with on-line error detection capabilities: - in local error dete~tion computation is checked locally in each P E and only tested results are delivered from a P E to the others, -

in error detection at array level results are tested only at the array borders to decrease the silicon area required by checking circuits.

V. Piuri / Fault-Tolerant HexagonalArithmetic Array Processors 5.1 Local e r r o r d e t e c t i o n

633

blt-tmraLr¢l 10£

Local error detection allows for the finest and immediate

-~u..~

--"

A*N+H

reoidue

error detection and fault localization: checkers are in fact connected directly at the output of each PE (fig. 3). 5~

,

I

2~

I

8

V

12

18 n

Fig. 4 " Area iner~se

A*N+B c ~ d ~ g

Fig. ~ - Local error detection As computation time is concerned, checkers inside the PE can he carefully designed to avoid delays. In fact output data from a PE can be concurrently delivered to the adjacent circuits and to checker so that verification may be performed while the PE itself computes the subsequent output data. In this case the check signal related to an output result is available only after a given delay, but PEs are not slowed down by data checking. When A*N+B coding is used, some sources of delay are intrinsic in arithmetic units, in encoders an in decoders at array borders. Delay introduced by the arithmetic units cannot be avoided since each unit must work on more bits than the correspondent unit for uncoded data. On the other hand, input data must he coded before computation begins: in fact coding cannot be performed concurrently with the computation as in residue arithmetic. Finally, output results axe available at array borders only after decoding. The area increase of the PE is due to the additional area required by each arithmetic unit and to the checker. Moreover, the area of the whole array is increased in A*N+B coding approach by input encoders at array borders. For both approaches the silicon area is practically constant but whether there are very few PEs in A*N+B arrays: in such case the incidence of the encoders cannot be neglected. Let's note that residue arithmetic units in bit-serial implementation have an area equal to usual binary arithmetic units: therefore, residue arithmetic must be discarded a priori since it is too expensive and is impractical for an integrated implementation. Anyway, also residue arithmetic in bit-parallel architetures and A*N-I-B codes require a redundant silicon area too large with respect to its benefits, in particular when large VLSI or WSI array processors must be implemented. Some results of our research about the performances of residue arithmetic and A*N+B codes by adopting local error detection are shown in flgg. 4 and 5 for bit-serial and bit-parallel units respectively. There we have drawn the percentage increase of the number A of gates in a PE and the percentage increase of the computation time r versus the number n of bits of uncoded words, the number k of redundant bits and the number of PEa in the array ( M x M arrays are considered). The number of gates has been adopted as estimator of the silicon area since it is independent on the integration technology and process.

bit - p m ~ l l ¢ l

~

b~t-~nal

Fig. 5 - Increase of computation time

5.2 E r r o r d e t e c t i o n a t a r r a y level Very attractive results as silicon area is concerned may be achieved by adopting error detection at array level In such approach concurrent error detection via residue arithmetic or A*N+B codes is achieved by checking the outputs of the PEa only at the array borders: PEa inside the array do not contain any checker. The computation delay introduced in these architectures is the same of local error detection, but a great amount of silicon area is saved allowing the implementation of large VLSI and WSI processing arrays. Nevertheless, this important advantage implies an increase of error latency: in the worse case it is equal to the computatioon time of a row of PEs because errors can be detected only at the array borders. On-line error detection at array level consists in checking the output of the PE at the end of each diagonal of the array itself. First, we consider the residue arithmetic approach: the architecture is shown in fig. 6. The binary array executes its computation as in the array without error-detecting capabilities. Its inputs are also converted in the selected residue number system by a set of binary-to-residue converters and, then, delivered to the residue arrays (one for each base). Residue arrays perform the same computation of binary array, but all operations are in modulo. If no fault occurs, the output of each residue array is the residue of the output of the binary array in the corresponding base. The output of the binary array is converted in its residue representation by another set of binary-to-residue converters. The syndrome is finally generated by subtracting the outputs of residue arrays from the residues of the binary array; if all these differences are zero, no error is detected.

V. Piuri / Fault-Tolerant HexagonalArithmetic Array Processors

634

check

signaJs

In figg. 8 and 9 the percentage increase of the number A of gates (strictly related to the silicon area) and the percentage increase of computation time r are shown for bit-parallel and bit-serial arrays.

100

bit-parallel -- A*N+B residue

~- ~10 ~-~-~0

---

?5

50 lO0

25

8

12

bit-serial -- A*t~+B

k-~#-IO

75 16 n ~0

Fig. 6 - Error detection bp using residue arithmetic 25

A careful design of binary-to-residue converters allows to achieve a low increase of silicon area in bit-parallel architectures: in fact the residue arithmetic units are very smaller than the binary ones because they work on a reduced number of bits. As in local error detection, bit-serial arrays do not present the same advantages since the residue and binary units have about the same dimensions: in this case a better approach consists in doubling the binary array and in comparing their outputs. The architecture implementing on-line error detection via A*N+B coding is shown in fig. 7. The input data are coded in the adopted A*N+B code. PE8 execute their computation as in the basic array without error detection capablUties: the only differences are the number of bits that must be treated, the computation time and the dimensions of the arithmetic units. Decoding/checking circuits are connected at the output of the last PE of each diagonal of the array. When the output does not belong to the selected code, an error has been detected in the array. check

~...,,

e°c°- L.2 L..I Fig. 7 - Error detection by using .4 ~N÷B c~des This solution allows a low increase of silicon area with respect to the array for uncoded data because only one checker has been introduced. A constant area is used by encoders at the array borders, while other constant additional area is required in each PE to treat input data having a higher number of bits than uncoded data. The area increase due to the checker decreases and becames neglectable as the number of PEa grows.

8

]2

}d h

Fig. 8 - Area increase

--~V=~

l~-2Jl=lO

blt-~Bel -- A*N÷B

"-- residue

bi[- eeFia]

j~.alO

~Ot~~k.2,Jl=l

4

-- A"N+B

8

I2

18 n

Fig. 9 - Increase of computation time

6. F A U L T L O C A L I Z A T I O N The architectures for on-line error detection at array level are not able to identify completely the position of the faulty PE. Only the diagonal in which the faulty PE lies can be found by looking for the active error signal. Off-llne testing of the PEs must be executed to identify the faulty element and, then, to reconfigure the array. A fast PE identification and, thus, a high availability of the system may be achieved by means of additional circuits which test concurrently the intermediate results of PEs and produce additional error signals. In particular, it is necessary to found the row of the faulty PE: such PE is in the intersection between the diagonal and the row whose error signals are active. An interesting architecture with fault localization capabilities is shown in fig. 10 both for residue arithmetic and A*N+B coding. Instead of checking the output of all PEs by placing a checker in each of them, PEs' outputs are added together to generate a redundant output do., in horizontal direction.

I/. Piuri / Fault.Tolerant Hexagonal Arithmetic Array Processors

635

This solution requires a larger area with respect to the previous architecture because of the further adder which compute the output do., and the additional checkers. In fig. 12 the percentage increase of the number of gates is given for bit-serial and bit-parallel implementations.

10~

J'~-~='O

--

A*K+B

~

---

residue

100

~.~)l-lO

--

A*tI+B

7~ 5d

,

-................

~ic

o

le

~,~ h

e

i

,r~

16 n

Fig. I ~ - Area for concurrent fault loeali~tlon Let's note that the additional adder inserted in the PE of fig. 11 protects the multiplier and the first adder of the basic structure, but it is not able to detect errors due to faults in itself. If such circuit fails, only the diagonal containing the faulty PE might be identified, not its row since the check on rows could be passed by all PEs. To overcome this drawback a third set of error signals can be computed along diagonals.

b) A *N÷B coding Fig. I 0 - Fault localization at array level To compute these sums the basic PEs must be modified by inserting another adder, as it is shown in fig. II. The inputs of this additional adder are the output Co,, of the PE considered until now and the partial sum c,. computed by the corresponding adder of the PE in the above diagonal of the same row.

To such purpose a third adder can be inserted in the PE structure, as it is shown in fig. 13.a. The output eo=t of such adder is obtained by adding the output d.,t of the PE itself and the output ¢o== of the previous element in the diagonal. If a fault occurs in the adder which generate do,t, it may be detected by examining the outputs e.=t at array borders. Therefore, the row of the faulty PE may be identified by examining the outputs do., at the leftmost border, while its diagonal may be obtained from Co,, and, in case, from Co=t •

Fig. 11 - P E for fault localization The fault and error models discussed in section 2 are still valid and, therefore, the additional outputs do,t give information about the state of the PEs belonging to the corresponding row. Each of these outputs is tested in the same way of diagonais. In residue arithmetic, a syndrome for each row is computed as for diagonals: non-null diagonal syndromes detect an error in the array and identify the diagonal containing the faulty PE, while non-null row syndromes identify its row. In A*N+B coding, the checkers at the end of diagonals verify whether results belong to the proper code and identify the diagonal of the faulty PE, while the checkers at the leftmcet border identify the row in which it lies. This two information localize the faulty PE.

b~ a

b

Fig. 13 - Modified PEa for fault localization A second solution is shown in fig. 13.b; the two additional adders introduced in fig. 13.a are substituted by a unique adder with three inputs: it computes a checksum for each PE by adding the nominal output ¢o.=, the checksum do~= generated by the PE in the above diagonal at the same row and the checksum So,, produced by the PE in the row below on the same diagonal. This structure detect errors as the previous one, but a careful hardware design may allow to save silicon area.

V. Piuri / Fault-Tolerant HexagonalArithmetic Array Processors

636

Both these architectures require a larger silicon area since another adder has been inserted in each PE and further checkers are placed along the borders of the array. In fig. 14 the percentage increase of the number of gates is presented for the residue arithmetic approach and for A*N+B coding in bit-serial and in bit-parallel units. bit-para~l

IO~ k . ~ ) / . l ~

\

----

A*N+B reeidue

bit-seriol -A*N+B

100 k-2JlffilO

75

25

REFERENCES [1] H.T. Kung, "Why systolic architectures? ~, IEEE Computer Magazine, Jan. 1982 [2] S.Y. Kung, S.C. Lo, S.N. Jean, J.N. Kwang, WWavefront array processors - Concept to implementation ~, IEEE Computer Magazine, July 1987 [3] R. Negrinl, M. Sami, R. Stefanelli, nFault tolerance tecniques for array structures used in supercomputing ", IEEE Computer Magazine, Feb. 1986 [4] D. Siewiorek and R. Swartz, The theory and practice of reliable system design, Digital Press, 1982 [5] M.G. Sami, R. Stefanelli, "Reconfigurab]e architectures for VLSI processing arrays', Proc. IEEE, May 1986

Fig. 1~ - Area for modified PEs

7. C O N C L U D I N G R E M A R K S On-line error detection constitutes one of the most important problems in designing fault-tolerant arithmetic array processors: in this paper we have proposed and discussd the use of residue arithmetic and A*N+B data coding in a class of array architectures for a wide spectrum of high-computing applications in digital signal and image processing and in matrix calculus. Different solutions have been presented and evaluated by considering error detection both at local level and at array level. In the first case the output of each PE is locally tested to detect errors; in the second case the check on the results is performed only at the borders of the whole array. For both approaches the silicon area, the computational delay and the error latency have been estimated. The survival of the system after faults may be easily achieved by means of the reconfiguration techniques known in literature: some of the architectures presented here are able to supply on-line information about the position of the fault in the array. This allows for increasing the overall system availability. The others architectures require an off-line test procedure to localize the faulty PE. By comparing the performance and the area requirements of the different solutions presented for on-line error detection, we note that our residue arithmetic approach is not well suited for bit-serial architectures: better results may be achieved by means of A*N+B coding. On the other hand, when high throughput is required in bit-parallel units, the residue arithmetic must be preferred since it does not introduce delay in the computational flow unlike A*N+B data coding.

[6] J.A. Abraham, P. Banerjee, C. Chen, W.K. Fuchs, S. Kuo, A.L.N. Reddy, ~Fault-tolerance techniques for systolic arrays', IEEE Computer Magazine, July 1987 [7] A. Avizienis, "Arithmetic error codes: cost and effectiveness studies for application in digital system design", IEEE Trans. on Comp., Nov. 1971 [81 W.W. Peterson, E.J. Weldon, Error-correcting codes, ed. MIT Press, 1972 [9] J.F. Wakerly, Error detecting codes, self-checking circuits and applications, ed. Elsevier-North Holland, New York, 1978 [10] J-Y. Jou, J.A. Abraham, ~Fault-tolerant matrix arithmetic and signal processing on highly concurrent computing", IEEE Proc., May 1986 Ill] R. Negrini, M. Saml, "Array architectures for signal processing", Proc. Journees d'electronique 1985, Lausanne, Oct. 1985 [12] R. Wadsack, "Fault coverage in digital integrated circuits ~, The Bell Syst. Tech. Jour., vol. 57, June 1978 [13] V. Piuri, "Fault-tolerant systolic arrays: an approach based upon residue arithmetic ~ , Proc. ARITH-8, Como, Italy, May, 1987 [14] N.S.Szabo e R.I.Tanaka, ~Residue Arithmetic and Its Application to Computer Technology", McGraw-Hill, NY, 1967 [15] V. Piuri, R. Stefanelli, G. Traverso, "Error detection in serial multipliers and in systolic arrays: an approach based upon A*N codes", Microprocessing and Microprogramming - the EUROMICRO Journal, April 1988