MR-12014; No of Pages 11 Microelectronics Reliability xxx (2016) xxx–xxx
Contents lists available at ScienceDirect
Microelectronics Reliability journal homepage: www.elsevier.com/locate/mr
Self-repairing radix-2 signed-digit adder with multiple error detection, correction, and fault localization☆ Hossein Moradian a, Jeong-A Lee a,⁎, Adnan Hashmi b a b
Computer System Lab, Department of Computer Engineering, Chosun University, South Korea SoC System Lab, Department of Information and Communication Engineering, Chosun University, South Korea
a r t i c l e
i n f o
Article history: Received 22 December 2015 Received in revised form 8 May 2016 Accepted 13 June 2016 Available online xxxx Keywords: Reliability Fault tolerance Fault detection correction Fault location Signed-digit adder
a b s t r a c t The advent of advanced microelectronic technologies and scale downing into nanometer dimensions has made current digital systems more susceptible to faults and increases the demand for reliable and high-performance computing. Current solutions have so far used the parity prediction scheme to increase reliability and detect fault in adder modules, but they add perceptible area overhead to the circuit. In this paper, we present two new efficient methods for fault detection and localization, in addition to the full error-correction, targeting stack-at and multi-cycle transient (MCT) faults in radix-2 signed-digit adders through a combination of time and hardware redundancy. In this study, we use the self-checking full adder that can identify a fault based on internal functionality to detect any fault in the adder modules. The detection of a fault is followed by input inversion, recomputation, and appropriate output inversion to correct the error and localize the fault. The errorcorrection method employs fault masking by utilizing the self-dual concept, which is based on the fact that in the presence of a fault, the designed technique results in a fault-free complement of the expected output when fed by the complement of its input operands. In addition, the existence of any fault in the input lines of the adder modules can be identified by low-cost parity checking error-detection approach, and a faulty module can be localized by comparing the faulty output from the first computation with the fault-free output from the recomputation. Based on the experimental results, the area occupied by our designs is approximately 50% that of the area used by previous designs that employ the parity prediction scheme. In addition to the area reduction, our design approaches result in a higher reliability with less power consumption and low time delay. © 2016 Elsevier Ltd. All rights reserved.
1. Introduction The increasing complexity of circuits, reduced clock cycles, and reduction in transistor size, along with the presence of radiation and other environmental conditions, has made current hardware systems more prone to faults [1]. Thus, a fault-tolerant characteristic is of great importance not only for mission-critical and safety systems but also for the general computing systems. With continuous technology scaling and increasing device density, especially in nanometer technologies, the tolerance for both permanent stuck-at and transient faults grows. In case of permanent stuck-at faults, circuit manufacturers apply different error detection patterns to detect faulty modules after fabrication, but it is not guaranteed that all ☆ This research was supported by National Research Foundation of Korea funded by the Ministry of Science, ICT and Future Planning (NRF-2013R1A1A3012335). This work was also supported by the Korea Institute of Energy Technology Evaluation and Planning (KETEP) and the Ministry of Trade, Industry & Energy (MOTIE) of the Republic of Korea (No. 20164010201020). ⁎ Corresponding author. E-mail addresses:
[email protected] (H. Moradian),
[email protected] (J.-A. Lee),
[email protected] (A. Hashmi).
permanent faults can be detected in the testing phase. In addition, stuck-at faults can appear in circuits afterward because of different reasons such as accidental overvoltage or aging, temperature, or other environmental conditions. Therefore detecting and correcting stuck-at faults during the time that the module is online, without interrupting the system, is important. Transient faults appear only for a short time and then disappear. Therefore, detecting transient faults is more important than correcting them because after the fault disappears, the module functions correctly. By increasing the operating frequencies in recent nanometer-technology-based circuits, transient fault phenomena that previously occurred for durations as short as and much less than one clock cycle can be now considered as long-duration transient (LDT) or multi-cycle transient (MCT) faults [2,3]. In fact, the effects of MCT faults clearly have a much higher probability of not being masked, and so they also stand a greater chance of producing system failures [4]. On the other hand, using ordinary transient fault-tolerance techniques is not efficient for MCT faults, and the proper error-correction methods should be applied for this kind of fault [2]. Because MCT faults remain for several clock cycles, this kind of fault can be treated as a stuck-at fault. Arithmetic operators, as the main components of the processing elements, are also susceptible to different environmental effects. Among
http://dx.doi.org/10.1016/j.microrel.2016.06.010 0026-2714/© 2016 Elsevier Ltd. All rights reserved.
Please cite this article as: H. Moradian, et al., Self-repairing radix-2 signed-digit adder with multiple error detection, correction, and fault localization, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.06.010
2
H. Moradian et al. / Microelectronics Reliability xxx (2016) xxx–xxx
the arithmetic operators, adders are some of the essential elements present in almost all digital devices. Hence, designing a reliable adder that can detect faults and correct the errors is an important challenge and a prominent goal. To facilitate fault detection, the concept of selfchecking was introduced, and a self-repairing concept was defined for error correction in [5,6], respectively. Many approaches have already been proposed for self-checking and self-repairing adder designs [7]. However, present solutions so far still suffer from significant overhead for self-repairing. In addition, relatively fewer research works have been conducted related to reliable adder with reduced occupied area, delay time, and power consumption overhead, all together. The first self-checking adder was developed in [8,9] using arithmetic residue codes. Requiring complex checker circuits is one of the disadvantages of arithmetic residue codes, as discussed in [10]. In [11], the Berger code was presented, and it has been used in error-detection schemes; however, the Berger code can detect only systematic unidirectional errors. Double modular redundancy (DMR) or duplication with comparison has been introduced and widely used to design self-checking systems. However, DMR is only able to detect a single or an odd number of errors. Designing a self-checking adder using DMR will increase the hardware overhead by more than 100%. Another scheme for self-checking systems is the parity predictionbased design that was introduced in [10,12]. In the parity prediction technique, the parity of an output is calculated based on the input parities. Then, the parity of the output after producing the result is computed and compared with the predicted parity. The hardware overhead in the parity-based design is lower than that in the previously mentioned techniques. The basic drawback of the parity prediction schemes is that they may not be secure against a single fault because they propagate to multiple output errors that are undetectable by the parity code. For example, a single fault in an adder can produce an error on a carry signal, which can propagate to several outputs in the adder. To avoid this problem, the combined double-rail checking of the carries and parity prediction for the outputs is applied to achieve a fault-secure property [10]. However, this scheme adds extra hardware overhead to the circuit. Different error-masking techniques can also be used for arithmetic operations. The triple module redundancy (TMR) is a general method that employs three isolated, parallel modules and provides correct output. If any one of the three modules fails, the other two modules can mask the fault using majority voting system. The triplication of the circuit and the use of majority voter system require more significant hardware resources [13–15].
Table 1 Parity of an ordinary number and the corresponding signed-digit representation based on the explained encoding method. a
Signed digit representation (a+ a−)
P(a)
P(a+ a−)
1 −1 0
01 10 00/11
1 1 0
1 1 0
In [16], the authors presented another error-correction method. This method first detects an error using parity prediction then obtains the inverse correct output by inverting the input operands. Compared with the TMR, this method has a higher time overhead but lower hardware cost. Another fault-tolerant adder design that uses parity prediction beside the TMR was presented in [17]. In this design, an adopted parity prediction scheme used with partial TMR is scattered in all of the bits to detect and correct multiple errors in a carry look-ahead adder. In [18], the authors presented an area-efficient self-checking selfrepairing adder with fault localization in which the faulty adder modules can be identified and the faults can be localized and repaired. In this method, fault detection and error correction were realized using the self-checking full adders (FAs) based on the internal functionality of an FA. In [19] a fault-tolerant parity prediction-based technique was used to detect stuck-at faults in a signed-digit adder scheme along with fault-localization and error-correction capability. Using shifted operands (left- and right-shifted operands), this algorithm recomputes the operation and locates and corrects the fault. However, in some cases, the correction is only a partial correction. Recently in [20], a new method to detect, localize, and correct a single stuck-at fault in a binary signed-digit number (BSDN) adder has been designed. This method uses the parity prediction scheme for fault detection and recomputation with inverse inputs for fault localization and error correction. In the present paper, we focus on exploring a fault-tolerant BSDN adder which is an extended version of the work published in [20]. We use the same concept as [20] in terms of error correction, but the error detection part uses different method. In addition, the BSDN adder design in this paper is different from that of [20]. The objective of this paper is to present a new efficient procedure that will achieve fault detection and localization, in addition to full error correction, in the BSDN adders in which the faulty module can be identified with lesser area, time, and power overhead.
Fig. 1. Binary signed-digit adder using a self-checking FA.
Please cite this article as: H. Moradian, et al., Self-repairing radix-2 signed-digit adder with multiple error detection, correction, and fault localization, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.06.010
H. Moradian et al. / Microelectronics Reliability xxx (2016) xxx–xxx
The rest of this paper is organized as follows. Section 2 discusses related works that have been performed on fault localization, detection, and error correction methods in BSDN adders and FAs. The proposed methods are described in Section 3. An evaluation, comparison, and discussion are presented in Section 4. Finally, concluding remarks are given in Section 5. 2. Background and related works Although many research activities have been conducted in the area of self-checking adders, only few of them focused on the self-checking BSDN adders. In this section first we introduce the radix-2 signed-digit addition, and then we report some approaches that motivated on selfchecking radix-2 signed-digit addition. Finally we will briefly introduce the self-checking ordinary FA that is used in our proposed designs. 2.1. Radix-2 signed-digit addition In conventional ripple-carry adders, carry propagation significantly reduces the operation performance, especially when the size of the operands increases. Using a redundant number system eliminates or reduces the carry propagation in the addition operation [21]. The advantage of using a redundant system is to obtain an addition or subtraction operation with a complexity of O(1), which is independent of the digit length of the operand, and results in a fast adder circuit design [22]. Given an integer number a can be represented in radix-2 signeddigit system as: a¼
n −1 X
xi 2 i
i¼0
where xi ∈ {−1, 0, 1} and n is the number of ternary digits. When two radix-2 signed-digit numbers, a and b, are added, only the addition of digits 1 with 1 or − 1 with − 1 will cause carry propagation to the next higher-order digit. This problem can be avoided by transforming the addition operands into the transfer (intermediate carry) and weight (partial sum) digits, in a way that the ith weight digit and the (i − 1)th transfer digit generate no carry. This addition is performed in two steps. In the first step, a transfer digit (intermediate carry) ci and weight digit (partial sum) wi are generated such that [23]: ai þ bi ¼ 2ciþ1 þ wi
3
In the next step, the final sum value, z, is computed as: zi ¼ ci þ wi In [23], authors presented two approaches in order to perform radix2 signed-digit number addition: the double-recoding method and the method that uses information from the previous digit position. Both these approaches can be used for obtaining the sum result. In order to denote the radix-2 signed-digit number, each digit can be represented by a minimum of two bits using different encoding methods. Parhami [24] proposed two encoding methods. One is the “sign” and “value” (s, v) encoding, and the other is the “negative” and “positive” (n, p) encoding. In the (s, v) encoding, −1, 0, and 1 are represented by (1, 1), (0, 0), and (0, 1), respectively, and in the (n, p) encoding, −1, 0, and 1 are represented by (1, 0), (0, 0), and (0, 1), respectively. In the present study, to encode the BSDN, (n, p) representation that considers digit 0 representation by either (0, 0) or (1, 1) is used. Different designs have been presented for radix-2 signed-digit adder implementation [25–29]. In the present study, we use the presented method in [23]. This implementation uses the double-recoding method and comprises two levels of FAs. 2.2. Self-checking radix-2 signed-digit adder In [19] Cardarilli, Ottavi, Pontarelli, Re, and Salsano presented a selfchecking binary signed-digit adder design that uses parity checker targeting a single stuck-at fault. The parity properties enable us to evaluate the correctness of the outcome in the binary signed-digit representation. As mentioned earlier, based on the BSDN encoding delineation system, three different values can be represented by digit xi (− 1, 0, 1), and two bits are needed for the binary representation. We define P(xi) as the parity of xi, which is XOR of the two bits that form digit xi. We also define P(X) as the parity of signed-digit number X, which is XOR of parity P(xi) of all digits xi. We use these definitions to define P(A) and P(B) as parities of the addend and augend, respectively, P(C) and P(W) as parities of the intermediate carry and partial sum, respectively, and P(Z) as the parity of the result. For this design, the following properties are demonstrated [19]: Property 1. P ðZ Þ ¼ P ðW Þ⊕P ðC Þ
Fig. 2. SBSA-PP.
Please cite this article as: H. Moradian, et al., Self-repairing radix-2 signed-digit adder with multiple error detection, correction, and fault localization, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.06.010
4
H. Moradian et al. / Microelectronics Reliability xxx (2016) xxx–xxx
Fig. 3. SBSA-IP.
Property 2. P ðW Þ ¼ P ðAÞ⊕P ðBÞ
Reference [19] also introduced an algorithm to localize the fault and correct the error for binary signed-digit addition. This algorithm recomputes the operation using shifted operands (left- and rightshifted operands) and locates the fault source. However, in some cases, the correction is only partial. The fault-localization and error-correction algorithm is started once a parity error is noticed. Whenever a fault is detected, addition is performed again using the left shifted operands (LSI). The two results from the first (original inputs) and second (LSI) computation are compared, and the number of unequal digits and the position of inequality are calculated. Depending on the position and number of unequal digits, fault localization and correction can be done in this step, or the operation is performed again using right shifted operands (RSI). In this method, faults are considered in three categories to appropriately localize any fault. • Type-1 fault: stuck-at fault in one of the ADD1 or ADD2 outputs (ci, wi, or si) will cause an undesirable change of up to one bit. • Type-2 fault: stuck-at fault in the least significant bit (LSB) of an input digit causes a change of up to two bits. • Type-3 fault: stuck-at fault in the most significant bit (MSB) of an input digit causes a change of up to three bits. The proposed algorithm specifies that all Type-1 faults can be corrected and localized. Type-2 and Type-3 faults are correctable only if during the recomputation, the fault is covered, and none of the error
indicators issues an error signal. If not, only a partial correction is feasible. Only half of the Types 2 and 3 faults can be totally corrected and localized because in one bit, the chance of covering a fault is 50%. Later, Alavi and Faez [30] improved the correction capability to obtain full correction using the “Recomputation with Triple-Shifted Operands” method. However, this method requires up to four times recomputation using shifted inputs to be able to localize a fault and correct the error. Further, it incurs extra time and area overhead and makes the originally presented algorithm more complex. Finally, in [20], based on the self-checking BSDN adder design in [19], we presented a new algorithm to correct the error and localize the fault based on the self-dual concept. This method is based on the principle that the presented system, when fed by the inverse of its functional operand under the presence of a stuck-at fault, produces a fault-free inverse of the expected correct output. In this method, fault detection is tracked by operand inversion, recomputation, and desired output inversion. Comparison of the faulty and fault-free inverse outputs will localize the faulty module. This design approach will result in a higher reliability with less computational time and reduced hardware-area overhead compared with the previous approaches. 2.3. Self-checking ordinary full adder In an ordinary FA, self-checking is accomplished using the observed relevance among inputs, sum, and carry out. As mentioned in [18], when all three inputs of FA, namely, addend (A), augend (B), and input carry (Cin), are equal, the sum (Sum) and output carry (Cout) bits will be equal. When one of the three inputs is different, the sum and carry output bits will be complemented. These relationships can be used to design a self-checking adder with the cost of an equivalence tester (Eqt). The equivalence tester (Eqt) checks the equivalence of the inputs. G1 tests the equivalence of the outputs (Sum and Cout), and G2 (Ef) tests the equivalence of Eqt and G1. Equivalence tester Eqt ¼ A B C in þ ðA B C in Þ Error E f ¼ Sum ʘ C out ʘ Eqt ¼ G1 ʘ Eqt
Fig. 4. Masking of a fault using complemented input.
In an error-free calculation, when Eqt is logic 0 (all inputs are equal), the output of G1 and G2 must be logic 1 and 0, respectively; if Eqt is logic 1, then both G1 and G2 gates must be equal to logic 0. In any other case, a fault will be highlighted [18]. To prevent the non-detection of faults, Sum and Cout of the FA will not be sharing any logic; thus, any fault
Please cite this article as: H. Moradian, et al., Self-repairing radix-2 signed-digit adder with multiple error detection, correction, and fault localization, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.06.010
H. Moradian et al. / Microelectronics Reliability xxx (2016) xxx–xxx
5
Table 2 Complement of a signed-digit number. ai
ai
01 (+1) 10 (−1) 00 (0) 11 (0)
10 (−1) 01 (+1) 11 (0) 00 (0)
occurring in the internal logic associated with an individual component will only make that single component faulty and, therefore, easily detectable by comparison. 3. Proposed method The proposed self-checking binary signed-digit adder employs the parity checking technique, in addition to the self-checking adder implementation described in [18], for fault detection. To localize faults and correct errors, we utilize the altering logic technique in [16,31] with regard to the self-dual model detailed in [20]. A logic function is self-dual if and only if f(x) = ~ f(~x), where f is a logic function, x is a vector of logic variables, and ~x is the 1's complement of x. 3.1. Self-checking binary signed-digit adder We discuss the occurrence of stack-at and MCT faults in different parts of an adder
As noted in Section 2.1, a binary signed-digit adder can be implemented using conventional FAs. Therefore, to implement a selfchecking BSDN adder, a self-checking FA can be replaced by conventional FA. Fig. 1 shows the design of a self-checking adder for BSDN. In this design, a fault in any adder activates E1 or E2 error signals, and the faulty module can be localized. In this adder, only adder modules are protected. To protect the input lines, we use a parity checking technique. Here, we have two assumptions. First, we only have access to the pre-calculated parity of the addend and augend, and we do not have access to each input digit. Second, we do not have pre-calculated parity for the input digits, but we have individual access to all digits of the addend and augend. 3.1.1. Self-checking binary signed-digit adder using pre calculated input parities (SBSA-PP) In any FA, the parity of the input bits is equal to the parity of the sum. Therefore, to reduce the hardware size, we can use the sum output bit instead of calculating the parity of the three input bits.
1) Stack-at or MCT fault in adder modules; 2) Stuck-at or MCT fault on an input line.
P ðA; B; C in Þ ¼ P ðSumÞ
Fig. 5. Algorithm of the fault correction and localization for the SBSA-PP implementation.
Please cite this article as: H. Moradian, et al., Self-repairing radix-2 signed-digit adder with multiple error detection, correction, and fault localization, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.06.010
6
H. Moradian et al. / Microelectronics Reliability xxx (2016) xxx–xxx
of the adder inputs is complemented (a− i ). Thus, we can obtain the equality − þ ¼ P ðs Þ: P aþ i i ; ai ; bi Fig. 6. Example of the comparison results from first and second addition operations.
In addition, according to Table 1, the parity of one ordinary number is equal to the parity of the corresponding signed-digit number. Hence, we can utilize the following relationships, where A and B are two n-bit numbers (addend and augend). a0, a1, …, an − 1 and b0, b1, …, bn − 1 are individual binary bits representing A and B, respectively, and n is − the total number of bits representing each of these variables. a+ i , ai − and b+ , b are signed-digit representations of a and b bits, respeci i i i tively:
−
We add bi to both sides of the equality. − þ þ P b− ¼ P ðs Þ þ P b− P aþ i i i ; ai ; bi i − þ − ¼ P s ; b− ⇒P aþ i i i ; ai ; bi ; bi
ð5Þ
In the parity, we have P ðxÞ ¼ P ðxÞ: Therefore: − þ − ¼ P aþ ; a− ; bþ ; b− ¼ P aþ ; a− ; bþ ; b− P aþ i i i i i i i ; ai ; bi ; bi i i − þ − ð6Þ ¼ P aþ i ; ai ; bi ; bi
A ¼ fa0 ; a1 ; …; an−1 g P ðAÞ ¼ P ða0 ; a1 ; …; an−1 Þ
Referring to Eqs. (5) and (6):
P ðai Þ ¼ P ðai þ ; ai − Þ þ
ð4Þ
−
þ
−
þ
−
⇨P ðAÞ ¼ P ða0 ; a0 ; a1 ; a1 ; …; an−1 ; an−1 Þ
ð1Þ
P ðBÞ ¼ P ðb0 þ ; b0 − ; b1 þ ; b1 − ; …; bn−1 þ ; bn−1 − Þ
ð2Þ
P ðA; BÞ ¼ P ða0 þ ; a0 − ; b0 þ ; b0 − ; a1 þ ; a1 − ; b1 þ ; b1 − ; …; an−1 þ ; an−1 − ; bn−1 þ ; bn−1 − Þ:
ð3Þ
As mentioned earlier, in one FA, instead of calculating the parity of three inputs, we can use the sum bit. Therefore, as shown in Fig. 1, we can use si as the parity of the adder inputs, but we must note that one
− − þ − P aþ ¼ P si ; bi i ; ai ; bi ; bi − ⇒P ðai ; bi Þ ¼ P si ; bi ⇒ ¼ P ða0 ; a1 ; …; an−1 ; b0 ; b1 ; …; bn−1 Þ − − − P s0 ; s1 ; …; sn−1 ; b0 ; bi ; …; bn−1 − − − ⇒P ðA; BÞ ¼ P s0 ; s1 ; …; sn−1 ; b0 ; b1 ; …; bn−1
ð7Þ
Referring to Eq. (7), by obtaining the parity of addend (A) and augend (B), any odd number of faults in the input lines can be detected by calculating the parity of the intermediate sum (s) and the augend LSB (b−). Fig. 2 shows this design. 3.1.2. Self-checking binary signed-digit adder using input bit parities (SBSAIP) For the second assumption, if we can access all digits in the addend and augend, we can calculate the parity of each signed-digit number, − e.g., P(a+ i ai ) and compare it with the parity of the corresponding ordinary bit, e.g., P(ai). Therefore, any fault in the inputs can be detected and localized. Fig. 3 shows this design. 3.2. Error-correction and fault-localization method To correct errors, we use the recalculation based on the self-dual concept [16,20,31] instead of the recalculating using the shifted-operand method [32]. The correction method is based on the fact that if one MCT or stuck-at-0 (1) fault occurs in one line of a circuit, the faulty line is set to 0 (1) and flips from 1 (0) to 0 (1). However, when the line value is 0 (1), the fault will be masked. We used this property and masked the fault by recomputation using the complemented inputs when any of the error indicators signals a fault. Fig. 4 shows this property. Computing signed-digit number 1's complement is done by changing the sign of each digit from positive to negative (and vice versa) and leaving the digit “0” unchanged. a ¼ 21035
ð18975Þ10
a ¼ 21035
ð−18975Þ10
Subsequently, calculating the inverse of the BSDN can be done by changing only digits 1 to −1 and −1 to 1, e.g., Fig. 7. Algorithm of the fault correction and localization for SBSA-IP implementation.
a ¼ 10110111
ð107Þ10
Please cite this article as: H. Moradian, et al., Self-repairing radix-2 signed-digit adder with multiple error detection, correction, and fault localization, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.06.010
H. Moradian et al. / Microelectronics Reliability xxx (2016) xxx–xxx
7
Table 3 Capability of fault detection.
Cardarilli's method [19] Moradian's method [20] SBSA-PP SBSA-IP
a ¼ 10110111
Single fault in input lines
Multiple faults in input lines
Single fault in adder modules
Multiple faults in adder modules
Yes Yes Yes Yes
No No No Yes
Yes Yes Yes Yes
Only odd number of faults Only odd number of faults Yes Yes
ð−107Þ10 :
According to our encoding method for the BSDN representation, the computation of 1's complement for the BSDN is similar to that of the conventional binary number 1's complement as listed in Table 2. Furthermore, we assume that both 00 and 11 is denoted as 0 so that complement of zero will be zero. Therefore, to calculate the BSDN 1's complement, all bits should be inverted, as shown below (“0” to “1” and “1” to “0”). a ¼ 1 0 1 1 0 1 1 0→ 01 00 10 01 00 10 10 00 a ¼ 1 0 1 1 0 1 1 0→ 10 11 01 10 11 01 01 11: As reported in [20] and by referring to the truth table of ordinary FAs, both the sum and output-carry functions are self-dual, which means that inverting all FA inputs inverts the value of both sum and output carry. As explained earlier, the BSDN adder design in this study consists only of FAs. Therefore, the described BSDN adder is also self-dual, which means that if we complement the BSDN adder inputs, all intermediate values and outputs will also be complemented. Therefore, if a stuck-at fault occurs in the input lines, FA modules, or output lines and causes flipping of the bit value from 1 to 0 or 0 to 1, it will be masked when we apply the complemented value. Therefore, the recalculated result is the inverted correct result. By inverting this result, we can obtain the correct final output. To localize the fault in our first design (SBSA-PP), if the fault is located in one of the adder modules, we can localize it by referring to the adder module number that issued the error signal because each adder has its own error signal. However, if the fault is located in one of the input lines, we can locate the faulty input module by comparing the faulty result obtained in the first calculation and correct the inverted result obtained in the recomputation. Fig. 5 shows the algorithm for the error correction and fault localization according to this design. In this algorithm, we define Z as the correct output and ZF as the faulty output in the first computation and ZINV and ZF-INV as the correct and faulty outputs, respectively, obtained using the inverted inputs in the recomputation. By using normal operands, we first perform the addition operation. After the calculation, if any of the error indicators detects and signals error, the operation will be followed by recomputation using the inverse operands. During normal operation, if any stuck-at fault occurs either in the self-checking FA input or output lines, that fault will cause the flipping bit from 1 to 0 or 0 to 1. We have earlier confirmed that if the operation is recalculated using the 1's complemented operands, any possible fault will be covered, and the calculated sum will be correct because all 0's become 1's and all 1's become 0's (01 to 10, 10 to 01, 00 to
11, and 11 to 00). Therefore, the bit will show the correct value where the corresponding stuck-at fault occurred. If any of the error indicators again issues an error after the recomputation, by comparing the two obtained results from the first and second calculations, if the results complement each other, we can conclude that a checker failure occurred; otherwise, more than one stuck-at fault occurred. However, if no error signal is activated after the recomputation, by referring to the first error signal source, if it is input error, we compare the two calculated results from the first and second computations and determine the faulty module number. Otherwise, if the error is issued by the adder module, we report the adder module number as a faulty module. The recomputed result is the 1's complement of the correct result, and applying any further error-correction procedure is not necessary. In one FA, any change in one of the input lines changes either the sum or both the sum and carry outputs. Therefore any fault in the a+ i , + a− i , or bi lines makes either the si or si and hi + 1 lines faulty. Conse+ + quently, any fault in the b− i , si, or hi lines makes either the zi or zi − + and zi + 1− output(s) faulty. Thus, any fault in one of the a+ , a , b i i i , + and b− i inputs can make a minimum of one (zi ) and a maximum of − + − four (z+ i , zi + 1 , zi + 1 , and zi + 2 ) faulty digits. We can conclude that any fault in the ith input digit position makes the zi output faulty. Thus, by comparing both faulty and fault-free results, the lowest equal bit indicates the faulty input digit position. We suppose that ZF = zF(n)…zF(i)…zF(0) is the faulty output calculated in the first step and ZINV = zINV(n)…zINV(i)…zINV(0) is the result of the calculation with the inverted inputs in the second step. For all errorfree bits, zF(i) is the inverse of zINV(i), but for the erroneous bits, zF(i) is equal to zINV(i). Therefore, comparing ZF and ZINV can locate the error source. For example, we assume that the result in the first computation is ZF = 11100100 and an input fault signal is activated. Therefore, we recalculate the addition using the inverted inputs, and the result is ZINV = 00010111. The comparison of these two results in Fig. 6 shows + that Z− 1 and Z1 are equal. Therefore, we can conclude that the fault is located in module number 1. The fault localization in our second design (SBSA-IP) does not require any comparison process because all input lines have a dedicated fault signal. In this implementation, when any fault is activated, similar to the previous implementation, recomputation will done using the inverse inputs. If the fault signal is again activated by comparing the first and second obtained results, we can determine that the error signal is activated due to a checker failure or existence of more than one fault. However, if none of the error signals is activated after the second calculation, the faulty module number that activated the error signal in the first computation is reported as the faulty module. Fig. 7 shows the fault-localization and error-correction algorithm for the SBSA-IP implementation. 4. Evaluation, comparison, and discussion
Table 4 Probability of error correction. Percentage Method
Input lines
Adder modules
Cardarilli's method [19] Moradian's method [20] SBSA-PP, SBSA-IP
50% 100% 100%
100% 100% 100%
To validate the efficiency of the proposed method, we compare our proposed methods with two different methods presented in [20,19]. We discuss the capability of the error detection, correction, and localization, along with the time, area, and power overhead. To achieve the time, area, and power comparison, all methods are coded, simulated, and synthesized by Verilog HDL, Modelsim SE, and Synopsys Design Compiler, respectively, using the Samsung technology process library.
Please cite this article as: H. Moradian, et al., Self-repairing radix-2 signed-digit adder with multiple error detection, correction, and fault localization, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.06.010
8
H. Moradian et al. / Microelectronics Reliability xxx (2016) xxx–xxx
Table 5 Occupied area in the conventional and self-checking BSDA (μm2). Area Implementation method Using information from the previous digit position Using double-recoding method
Unprotected BSDA Self-checking BSDA Unprotected BSDA SBSA-PP (Fig. 2) SBSA-IP (Fig. 3)
Table 6 Occupied whole area in the self-checking BSDA including the fault-localization and errorcorrection algorithm (μm2). Area Method
8 digit
16 digit
32 digit
64 digit
128 digit
Cardarilli's method [19] Moradian's method [20] SBSA-PP SBSA-IP
2476 1511 1162 1251
3873 2884 2159 2321
7420 5497 4358 4697
14,109 11,456 8651 9321
29,033 23,181 17,560 18,910
In addition, for accurate comparison, we design all methods for 8, 16, 32, 64 and 128 digits. 4.1. Error detection The methods presented in [20,19] can detect single/odd stuck-at faults either in the input lines or adder modules, and because of the use of a parity checker to detect a fault, these two methods are not able to detect an even number of faults. Because our SBSA-PP method uses the parity checker to check the existence of faults in the input lines, it can detect a single/odd stuck-at fault in the input lines. In addition, for the adder modules, it can detect all faults in the separate modules. Our second method, the SBSA-IP, can detect any number of faults either in the input lines or the adder modules. Table 3 lists the error-detection capability of the discussed methods.
8 digit
16 digit
32 digit
64 digit
128 digit
520 870 460 815 1251
1030 1721 899 1579 2321
2050 3472 1778 3198 4697
4091 6870 3535 6351 9321
8176 13,737 7053 12,666 18,910
two different implementations. The minimum area overhead for the self-checking BSDN adder is approximately 67% in the implementation using information from the previous digit position and 76% using the double-recoding method of the corresponding conventional BSDN adder. As listed in Table 6, adding the fault-localization and error-correction algorithm increases the occupied area in all methods. The authors in [19] mentioned the design overhead only for selfchecking binary signed-digit adder without the fault-localization and error-correction algorithm. After applying the fault-localization and error-correction algorithm, the area overhead increased, as the algorithm is large and complex. Comparing the results of the methods in [19,20], these two methods use the same self-checking binary signeddigit adder design, but because they use different algorithms for fault localization and error correction, the method in [20] occupies less area than that in [19]. In order to compare the algorithm in [19] with our methods, we prepared a slightly modified version of the algorithm in [19], including the error correction part in Fig. 8. In the SBSA-PP and SBSA-IP methods, the increase in area is not as large as for the methods in [20,19]. Further, the SBSA-PP requires a slightly smaller area than the other approaches to SBSA-IP. Our second method (SBSA-IP) needs a slightly larger area to detect and localize the fault, but still it needs a smaller area than both methods in [19,20]. Fig. 9 shows the increase in the area with the increase in the number of input digits in the four methods.
4.4. Time overhead 4.2. Error correction and fault localization As explained in Section 3, the Cardarilli's method [19] can only correct half of the detected errors located in the input lines. The method in [20] and our presented techniques in this paper can correct all detected faults. Table 4 lists the error-correction probability of the explained methods. With regard to fault localization, both methods presented in [20,19] localize the fault based on the fault type. As discussed in Section 3, a fault in the MSB of input linecan cause one to three inequalities, a fault in the LSB can cause up to two inequalities, and a fault in the adder modules can create one inequality. Therefore, when the number of inequality is one, we cannot locate the fault because it can be caused either by the input line (MSB or LSB) or one of the adder modules. In our first proposed method (SBSA-PP), because the error indicators for the adder modules and input lines are separated, the source of the fault can be specified as either due to the adder modules or input line. Because each adder has its own error indicator, any fault in any adder can be localized. To localize the fault in the input lines, comparing the calculated results from the first and second computations can specify the faulty line number. For the SBSA-IP implementation, all faults can be localized because all adder modules and input lines have separate error indicators. 4.3. Area overhead Table 5 lists the occupied area in the unprotected conventional binary signed-digit adder and self-checking binary signed-digit adder under
To analyze the time overhead, first, the delay time in the binary signed-digit adder and self-checking binary signed-digit adder was calculated. Because the signed-digit adder is a parallel adder, the delay time for all lengths of digits is equal. As our methods use different signed-digit adder implementation compared with the methods in [20,19], they have different delay times. For both our implementations, the delay time in the binary signed-digit adder is 0.66 ns, and those in the self-checking binary signed-digit adder are 0.78 ns and 0.85 ns for the SBSA-PP and SBSA-IP implementations, respectively. In an errorfree calculation, the Cardarilli's method [19], the method presented in [20], and our SBSA-PP and SBSA-IP methods take almost the same time, which are approximately 0.71, 0.71, 0.78, and 0.85 ns, respectively, for all digit lengths. Table 7 lists the time delays in the absence of a fault. For any fault, all methods need to perform the recomputation step(s) to localize and correct the faulty digit. Table 8 lists the estimated time for the localization and correction algorithm in all methods. As mentioned earlier, in our methods, correction can be straightforwardly done by inverting the result obtained in the second computation; however, the Cardarilli's method [19] needs a different calculation to obtain a correct result. Therefore, the time delays in our proposed methods are quite less compared with that in the Cardarilli's method [19]. For any fault, the Cardarilli's method [19] needs to perform a minimum of two addition processes (computation using normal inputs and LSI) with one localization and correction process and a maximum of three addition processes (computation using normal inputs, LSI, and RSI) with two localization and correction processes. In the method presented in [20] and our proposed methods in this paper, for any fault,
Please cite this article as: H. Moradian, et al., Self-repairing radix-2 signed-digit adder with multiple error detection, correction, and fault localization, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.06.010
H. Moradian et al. / Microelectronics Reliability xxx (2016) xxx–xxx
Fig. 8. Slightly modified algorithm reported in [19], including error-correction part (e is the location of the faulty bit).
Please cite this article as: H. Moradian, et al., Self-repairing radix-2 signed-digit adder with multiple error detection, correction, and fault localization, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.06.010
9
10
H. Moradian et al. / Microelectronics Reliability xxx (2016) xxx–xxx Table 10 Power consumption (μW). Power
Fig. 9. Increase in the area by increasing the number of input digits (μm2).
Table 7 Time delay for fault-free addition (ns). Time Method
8 digit
16 digit
32 digit
64 digit
128 digit
Cardarilli's method [19] Moradian's method [20] SBSA-PP SBSA-IP
0.71 0.71 0.78 0.85
0.71 0.71 0.78 0.85
0.71 0.71 0.78 0.85
0.71 0.71 0.78 0.85
0.71 0.71 0.78 0.85
Table 8 Time delay for the fault-localization and error-correction algorithm only (ns).
Method
8 digit
16 digit
32 digit
64 digit
128 digit
Cardarilli's method [19] Moradian's method [20] SBSA-PP SBSA-IP
171 146 92 105
288 264 174 137
614 592 367 240
1200 1190 758 573
2480 2470 1660 1541
4.5. Power overhead Finally, the power consumption comparison results of the methods in [20,19] with that of the proposed methods in this paper are listed in Table 10, which shows that the power consumption of our proposed methods is significantly lesser than those of the previous methods. Fig. 11 shows the results obtained from the four methods. The methods in [20,19] almost overlapped and incur more power consumption than the methods presented in this paper. The results prove the superiority of the proposed methods over the previously suggested methods in [20,19]. Our methods lead in all aspects such as lower application time, area overhead, and power consumption, along with high error-correction and fault-localization capability with more fault coverage.
Time Method
8 digit
16 digit
32 digit
64 digit
128 digit
Cardarilli's method [19] Moradian's method [20] SBSA-PP SBSA-IP
2.52 1.7 0.89 0.35
3.42 1.9 1.02 0.43
5.17 2.05 1.16 0.51
9.78 2.14 1.34 0.60
18.49 2.34 1.49 0.78
only two addition processes, along with one localization and correction process, are needed. It is worth mentioning that the fault localization and error correction process is conducted only after fault detection. This means that without any fault, the delay is the same as that in Table 7 for all methods. In case of a stuck-at fault, the fault-localization algorithm will execute for each Add instruction until the faulty module is replaced with a new fault-free module. However, for MCT faults, when one MCT fault appears in the module, after some cycles it will disappear. If another MCT fault comes up, the location of the fault will be different than the previous fault location; therefore, the fault-localization process will report a new location. This can be used for distinguishing the stuck-at and MCT faults in the module. According to Tables 7 and 8, we can estimate the delay times in the mentioned methods for faulty addition, as listed in Table 9. By referring to Tables 9 and 10, we can prove the superiority of our proposed methods compared with that of the Cardarilli's method [19] and our previously presented method [20]. The tables also indicate that in our methods, the delay time slightly depends on the input digit length compared with the Cardarilli's method [19]. Fig. 10 shows that the delay time increases by increasing the number of input digits.
5. Conclusion This paper has presented new fault-localization, detection, and error-correction methods for a self-checking BSDN adder scheme. A novel, low-complexity approach shows that this capability is achievable using the FA property, which takes advantage of the observed relationship between the inputs and outputs in an FA to detect faults, in addition to the self-dual function concept to set up fault-localization and errorcorrection procedures. The proposed methods allow us to achieve both error correction and fault localization in one step only. Future work will include the use of these methods in radix-n signed-digit number adders to obtain fault-tolerance property at low cost.
Fig. 10. Approximate delay time in the presence of a fault.
Table 9 Approximate time delay for faulty addition (ns). Time Method
8 digit
16 digit
32 digit
64 digit
128 digit
Cardarilli's method [19] LSI Cardarilli's method [19] RSI Moradian's method [20] SBSA-PP SBSA-IP
3.94 7.17 3.12 2.45 1.91
4.84 8.97 3.32 2.58 1.99
6.59 12.74 3.47 2.72 2.07
11.2 21.69 3.56 2.9 2.16
19.91 39.11 3.76 3.05 2.34
Fig. 11. Increased in power consumption with the increase in the number of input digits (μW).
Please cite this article as: H. Moradian, et al., Self-repairing radix-2 signed-digit adder with multiple error detection, correction, and fault localization, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.06.010
H. Moradian et al. / Microelectronics Reliability xxx (2016) xxx–xxx
References [1] S. Mukherjee, Architecture Design for Soft Errors, Morgan Kaufmann, 2011. [2] C.A.L. Lisboa, Dealing With Radiation Induced Long Duration Transient Faults in Future Technologies, 2009. [3] R.H.M. Huang, C.H.P. Wen, Advanced soft-error-rate (SER) estimation with strikingtime and multi-cycle effects, 51st ACM/EDAC/IEEE Design Automation Conference (DAC) June 2014, pp. 1–6. [4] R.P. Bastos, G. Di Natale, M.L. Flottes, F. Lu, B. Rouzeyre, A new recovery scheme against short-to-long duration transient faults in combinational logic, J. Electron. Test. 29 (3) (2013) 331–340. [5] A. Avizienis, H. Kopetz, J.C. Laprie (Eds.), The Evolution of Fault-Tolerant Computing: In the Honor of William C. Carter, Springer Science & Business Media, 2012. [6] A. Avižienis, Design of fault-tolerant computers, Proceedings of AFIPS ACM, 14–16, Fall Joint Computer Conference 1967, pp. 733–743. [7] I. Koren, C.M. Krishna, Fault-tolerant Systems, Morgan Kaufmann, 2010. [8] W.W. Peterson, On checking an adder, IBM J. Res. Dev. 2 (1958) 166–168. [9] W.W. Peterson, E.J. Weldon, Error-correcting Codes, second ed. MIT Press, Cambridge, 1972. [10] M. Nicolaidis, Carry Checking/Parity Prediction Adders and ALUs, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 11, no. 12003. [11] J.M. Berger, A note on error detection codes for asymmetric channels, Inf. Control. 4 (1) (1961) 68–73. [12] M. Nicolaidis, R.O. Duarte, S. Manich, J. Figueras, Fault-secure parity prediction arithmetic operators, IEEE Des. Test Comput. 2 (1997) 60–71. [13] R.E. Lyons, W. Vanderkulk, The use of triple-modular redundancy to improve computer reliability, IBM J. Res. Dev. 6 (2) (1962) 200–209. [14] P.K. Samudrala, J. Ramos, S. Katkoori, Selective triple modular redundancy (STMR) based single-event upset (SEU) tolerant synthesis for FPGAs, IEEE Trans. Nucl. Sci. 51 (5) (2004) 2957–2969. [15] F. Lima, L. Carro, R. Reis, Designing fault tolerant systems into SRAM-based FPGAs, Proceedings of the 40th Annual Design Automation Conference, ACM 2003, pp. 650–655. [16] P. Oikonomakos, P. Fox, Error correction in arithmetic operations by I/O inversion, On-Line Testing Symposium 2006, p. 6. [17] M. Valinataj, A novel self-checking carry look-ahead adder with multiple error detection/correction, Microprocess. Microsyst. 38 (8) (2014) 1072–1081.
11
[18] M.A. Akbar, J.-A. Lee, Self-repairing adder using fault localization, Microelectron. Reliab. 54 (6–7) (2014) 1443–1451. [19] G.C. Cardarilli, et al., Fault localization, error correction, and graceful degradation in radix 2 signed digit-based adders, IEEE Trans. Comput. 55 (5) (2006) 534–540. [20] M. Hossein, J.-A. Lee, Low-cost fault localization and error correction for a signed digit adder design utilizing the self-dual concept, 2015 Euromicro Conference on Digital System Design (DSD), IEEE 2015, pp. 276–279. [21] A.F. González, P. Mazumder, Redundant arithmetic, algorithms and implementations, Integr. VLSI J. 30 (1) (2000) 13–53. [22] K. Schneider, A. Willenbucher, A new algorithm for carry-free addition of binary signed-digit numbers, Field-Programmable Custom Computing Machines (FCCM), IEEE 22nd Annual International Symposium 2014, pp. 44–51. [23] M.D. Ercegovac, T. Lang, Digital Arithmetic, Morgan Kaufmann, 2004. [24] B. Parhami, Carry-free addition of recoded binary signed-digit numbers, IEEE Trans. Comput. 37 (11) (1988) 1470–1476. [25] N. Takagi, H. Yasuura, S. Yajima, High-speed VLSI multiplication algorithm with a redundant binary addition tree, IEEE Trans. Comput. 100 (9) (1985) 789–796. [26] S. Kuninobu, T. Nishiyama, H. Edamatsu, T. Taniguchi, N. Takagi, Design of high speed MOS multiplier and divider using redundant binary representation, IEEE 8th Symposium in Computer Arithmetic (ARITH) 1987, pp. 80–86. [27] N. Takagi, Studies on Hardware Algorithms for Arithmetic Operations With a Redundant Binary Representation, Kyoto University, 1988. [28] S. Kuninobu, T. Nishiyama, T. Taniguchi, High speed MOS multiplier and divider using redundant binary representation and their implementation in a microprocessor, IEICE Trans. Electron. E76-C (3) (1993) 436–445. [29] H. Makino, Y. Nakase, H. Suzuki, H. Morinaka, H. Shinohara, K. Mashiko, An 8.8-ns 54 × 54-bit multiplier with high speed redundant binary architecture, IEEE J. Solid State Circuits 31 (6) (1996) 773–783. [30] S.R. Alavi, K. Faez, Fault localization and full error correction in radix2 signed digitbased adders, Communications, Computers and Signal Processing 2007, pp. 214–218. [31] T. Ngai, C. He, E.E. Swartzlander, Enhanced concurrent error correcting arithmetic unit design using alternating logic, Defect and Fault Tolerance in VLSI Systems 2001, pp. 78–83. [32] J.H. Patel, L.Y. Fung, Concurrent error detection in ALU's by recomputing with shifted operands, IEEE Trans. Comput. (1982) 589–595.
Please cite this article as: H. Moradian, et al., Self-repairing radix-2 signed-digit adder with multiple error detection, correction, and fault localization, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.06.010