Self-checking logic arrays

Self-checking logic arrays

Self-checking logic arrays Logic arrays - - PLAs, ROMs and RAMs - - are a solution to the increasing complexity of VLSI. M Nicolaidis and B Courlois d...

1MB Sizes 6 Downloads 143 Views

Self-checking logic arrays Logic arrays - - PLAs, ROMs and RAMs - - are a solution to the increasing complexity of VLSI. M Nicolaidis and B Courlois describe self-checking schemes for these devices, which incur low chip area overhead

Self-checking blocks may be used to ensure concurrent error detection in integrated circuits. On the other hand, logic arrays such as PLAs, ROMs and RAMs are essential to circumvent the increasing complexity of VLSI circuits. Efficient self-checking schemes for logic arrays are therefore essential for concurrent error detection in VLSI circuits. The paper describes schemes that incur low area overhead. microsystems

logic arrays

VLSI

self-checking circuits

Periodic off-line testing of VLSI circuits may be used to ensure hard failure detection. However, errors produced I~y hard faults will remain undetected until the activation of the test phase. On the other hand, off-line testing is not effective against transient faults. Concurrent error detection techniques enable the detection of both types of errors, those due to hard faults and those due to transient faults, immediately they occur. Concurrent error detection based on software encoding techniques needs special software development and decreases the system performance slightly. Therefore, hardware coding based on the design of self-checking circuits (see Figure 1) should be used. Another advantage of self-checking circuits is that they may be designed to cover well-known models of hard faults. Complex circuits may then be built in self-checking blocks. In this case a double-rail checker is used to compress all individual error indication signals into a global error indication signal. The goal of self-checking circuits is often called 'total self-checking (TSC), i.e. the first erroneous output of the functional block provokes an error indication on the checker outputs. Carter I introduced the basic idea, and Anderson 2 formally defined the totally self-checking property for functional blocks and for checkers. • A circuit G is self-testing for a fault set F if: v f C F, ] a EA:G(a, f ) ~ B •

A circuit is f a u l t - s e c u r e for a fault s e t F if: v f E F,

V a E A:G(a, t) = G(a, ~) o r G(a, t ) ~ B . • A circuit is TSC if it is b o t h s e l f - t e s t i n g a n d f a u l t - s e c u r e . IMAG/TIM3, 46 Avenue Fdlix Viallet, 38031 Grenoble Cedex, France Paperreceived:12 December1987. Revised:9 June 1988

This work has been supported by the EEC(ProjectNo 888 AIDA)

Encoded outputs

A Inputs

~_ r[

Functional circuit

~ Error indication

Figure 1.

General structure of self-checking circuits

A denotes the input code space of the circuit, B the output code space and F the set of considered faults. The effectiveness of TSC circuits is based on the assumption that faults occur one at a time, and between the occurrence of any two faults a sufficient time elapses to allow all code inputs to be applied to the network. Under this assumption TSC circuits achieve the TSC goal which means that the first erroneous output does not belong to the output code space B. The largest classes of circuits achieving the TSC goal are the strongly fault secure (SFS) circuits 3 (defined - a circuit is strongly fault-secure for a fault set F if, for every fault f in F, either (1)the circuit is totally self-checking or (2)the circuit is fault-secure, and if a new fault in F occurs, for the obtained multiple fault, either the property (1) or the property (2) is true4)and the strongly code disjoint (SCD) checkers s. The use of regular blocks, such as PLAs, ROMs and RAMs, is essential to circumvent the increasing complexity of VLSI circuits. The paper describes the SFS design of such regular circuits and new solutions are proposed which allow both a decrease in overhead and choice of output codes according to the needs of the destination blocks.

FAULT A N D

ERROR MODELLING

Here a fault model dealing with failures of the physical elements constituting an integrated circuit is considered. These elements are contacts and buried contacts, polysilicium, diffusion and metal lines and MOSs. The fault model forms class I of the fault hypothesis (Table 1)6.

0141-9331/89/04281-10 $03.00© 1989 Butterworth & Co. (Publishers) Ltd

Vol 13 No 4 May 1989

281

Table 1. (1) (2) (3) (4) (5)

Class 1 of the fault hypothesis

A failed contact or buried contact An MOS stuck-on An MOS stuck-open A line cut A short between two neighbour lines (two aluminium lines or two diffusion lines)

Obviously, any one of the faults 1-4 will result in an error on a single line, in the place where they occur. For example, a driving MOS stuck-on will result in an error on the line (internal of a gate), related to the drain of the MOS. Errors due to shorts (type 5 faults) will occur when the two shorted lines have opposite values (0-1 or 1-0). Then, if the resulting level on the two shorted lines is a logical '0' or a logical '1' (as is the case for NMOS circuits, where the logical '0' is imposed, and CMOS precharged gates, where the logical '0' is imposed for gates precharged to '1' and the logical '1' is imposed for gates precharged to '0'), the resulting error is a single one. If the resulting level on the two shorted lines is undetermined (as is the case for fully complemented CMOS gates), a double error may result on the two shorted lines. Below errors due to two shorted lines are assumed to be single errors even for CMOS technology, because, generally, PLAs and ROMs in CMOS technology are designed using precharge logic. For fully complemented gates, the number of check bits needs to be increased in order to cover double errors due to shorts.

SFS property and the parity code When the parity is used, propagation of single intemal errors must give single errors to the encoded outputs. Thus, the fault-secure property will be achieved for the fault hypothesis given in Table 1. To achieve this goal Rule 1 may be used. Rule 1: the maximum divergence degree of the circuit is equal to one, i.e. each line is connected (through paths) to only one output. CMOS fully complemented gates are a special case which may give double errors in the event of faults. Then it is necessary to increase the number of checking bits as will be explained below. With the fault-secure property ensured by Rule 1, now the fault types that prevent the circuit being SFS are examined. To achieve the strong fault-secure property, it must be assumed that, if an undetectable fault occurs, Rule 1 remains valid. Similarly, if a second undetectable fault occurs, the rule must still remain valid etc. It is easy to verify that faults of types 1-4 (Table 1) do not increase the maximum divergence degree of the circuit. Therefore, with undetectable faults of types 1-4 Rule 1 remains valid and the SFS property is not prevented. On the other hand, a short between two lines connected with different primary outputs will modify the maximum divergence degree of the circuit, since the two lines will be collapsed into a single line. In summary, if output code detects single errors, if Rule I is verified and if sequences of undetectable faults

282

do not contain shorts between two lines connected with two different outputs, then the circuit is SFS7.

SFS property and codes detecting unidirectional errors As was shown above, the fault hypothesis considered here produces single intemal errors in the circuits. Rule 2 may be used to ensure that single internal errors are propagated unidirectionally to the circuit outputs and Rule 3 to ensure that single internal errors are propagated to the circuit outputs as errors of multiplicity < t. Rule 2. All paths between each line of the circuit and the outputs of the circuit must have the same inversion parity. (The inversion parity of a path is n modulo 2. When a logical level is considered, n is equal to the number of inverting gates (NAND, NOR etc.) through which the path passes3. When the transistor level is considered, n is equal to the number of couples (.gateand drain) of driving MOSs through which the path passesO). Rule 3. The maximum divergence degree of the circuit is t. This means that each line of the circuit is connected at most with t outputs of the circuit. For a circuit having an output code space detecting the unidirectional errors of multiplicity < t , Rules 2 and 3 ensure the detection of any output error due to a fault belonging to the fault model (Table 1). Thus the faultsecure property is ensured. However, to ensure the SFS property it is necessary that undetectable defects do not disable Rules 2 and 3. It has been shown 8 that of the faults listed in Table 1 only one can disable Rule 2 - shorts between two lines that are connected to the outputs of the circuit through paths that do not have equal inversion parities (Sp shorts). On the other hand, of the faults listed only one may disable (by increasing the maximum divergence degree) Rule 3 - - shorts between two lines for which the sum of the number of outputs to which they are connected is > t (St shorts). In summary, if the output code space detects the unidirectional errors having multiplicity < t, if Rules 2 and 3 are verified and if the sequences of undetectable faults do not contain shorts of types Sp or St, then the circuit is SFS.

DESIGN OF SELF-CHECKING PLAs Design of SFS PLAs using unordered codes has been proposed 9. Here a method to design SFS PLAs using the parity code is proposed. This method is interesting because it decreases the number of extra outputs needed for coding and simplifies the checkers.

Self-checking PLAs using the parity code In this scheme it is assumed that the inputs of the PLA are the outputs of another self-checking block and therefore that they are encoded. The checker of this block may be used to check the bit lines of the PLA. As can be seen in Figure 2, the bit lines are checked after crossing over the first matrix of the PLA. Two supplementary outputs are generated, one (PO) gives the parity of output lines and the other (PP) gives the parity of product lines. Two parity checkers are used to

Microprocessors and Microsystems

check the output lines and the product lines. The PLA is divided into three blocks (bit lines, product lines and output lines), each one being checked by its own checker.

First matrix 11 i,iI

Fault secure property As was proposed above, to ensure the fault-secure property in circuits checked by parity codes the maximum divergence degree of the circuit must be equal to one. When the global PLA is considered, this property is not ensured for bit lines and for product lines; therefore, in Figure 1 the PLA is divided in three blocks (bit lines, product lines and output lines), each one being checked by its proper checker. Then, for each of the three blocks the maximum divergence degree is equal to one and the fault-secure property is ensured.

1 2 ~

.......

,~-- ~"

Ik PI

i -~

Figure 3. Firstmatrix of a PLA -- special implementation and check scheme for bit lines

Implementation

First matrix

Consider the product lines used to generate only one of the outputs PO, 01, 02 . . . . . On; a fault on one of these product lines will produce a single error on the corresponding output line. Thus, it is not necessary to check these product lines (they are checked by the output lines checker), which may lead to less cells in product lines checker implementation (see Figure 2 where some product lines are not checked by the product lines checker). The scheme presented in Figure 2 ensures that each of the three blocks (bit lines, product lines and output lines) has a maximum divergence degree equal to one. However, more attention is necessary in the implementation of product lines. To generate the parity bit (PP) of product lines, the nonchecked product lines (see first point) can be used. All other product lines (i.e. the checked ones) are not to be used to generate PP. Otherwise, the maximum divergence degree of the product lines block, checked by the product lines checker, would be 2. Hence, some product lines must be duplicated. As can be seen in Figure 2, both direct and complementary bit lines must be checked after crossing over the first matrix of the PLA. Thus two checkers are needed (one for direct bit lines and another for complementary bit lines). This point is therefore aimed at minimizing the P1

=~ u

Ik m ~ _ _ ~ 01 __ 02i

On-PO

II

ITTTT

P1

P2

I

Product lines p a r i t y checker

Figure 2.

,~1

tO OJ

SFS PLA checked by parity codes

Vol 13 No 4 May 1989

'-

Q.>-

PP

~ "E O mo.

m

12

In

L>

.Q 3c-

Bit lines Error-indication Input lines error indication

Ip (Input lines parity bit)

Figure 4. Verification of bit lines and parity-encoded inputs of a PLA using the double-rail checker extra hardware needed for checking of bit lines. At first advantage can be taken of the checker of the block generating the inputs of the PLA; this checker can ensure the checking of direct bit lines. However, to avoid using a second checker (for complementary bit lines) the scheme given in Figure 3 may be used. A problem with this scheme is the extra delay introduced by signal propagation on complementary bit lines. However, in actual technologies (two metal layers), bit lines are currently implemented using both a polysilicon layer and a metal layer and the extra delay is negligible. Another solution may be used in the special case where inputs of the PLA are encoded in the parity code; advantage can be taken of a property of the double-rail checkers. These checkers t 2 have a one-to-one relationship with the parity trees 1°. In fact their double-rail outputs compute the parity of their double-rail inputs. Due to this property, double-rail checkers may be used to check both the bit lines (direct and complementary) and the parity encoded inputs of the PLA, as shown in Figure 4. In this figure the double-rail checker checks the direct and the complementary bit lines. At the same time the double-rail checker (due to the above mentioned property) gives the parity of inputs from I1 to In and is used to detect the errors occurring on the input lines. As was explained above, shorts in NMOS circuits, and also in CMOS circuits made up of precharged gates, give single errors (currently PLAs in CMOS are realized using precharged gates). If fully complemented CMOS gates are

283

used, shorts may give double errors. In this case, the output lines are divided into two groups and each of the two groups is checked by a parity bit. Then the two-group lines are implemented alternately such that no two adjacent lines are from the same group or are checked by the same parity bit. Product lines and bit lines must be checked similarly. Thus the fault-secure property is ensured. The method presented here, based on separate checking of bit lines, product lines and output lines, was also given by Chen etal. 11. However, the method was first presented bythe authors in a published report 12. The only difference between the two schemes is that in the scheme in Reference 12, if some product lines checked by the product lines checker are needed to generate PP, they must be duplicated (see above). On the other hand, in the scheme in Reference 11 two parity bits (odd and even parity) are generated for product lines. Hence, a fault in the product lines may modify only one of these parity bits. Thus, any product line may be used to generate them. Finally, in the scheme in Reference 11 the faultsecure property is discussed but no mention is given of the SFS property. SFS p r o p e r t y

As was explained above, faults of types 1-4 (Table 1) may be undetectable but do not prevent the circuit from being SFS. The only undetectable faults that may prevent the circuit from being SFS are shorts between two lines of the same block (bit lines block, product lines block and output lines block). The block of output lines is examined first. If two output lines do not generate the same function, then there is at least an input vector that can detect the short between these lines; therefore, initially, there are no undetectable shorts between two output lines. However, it is known that fault redundancy is a dynamic property. Then it must be examined whether, with a sequence of undetectable defects of types 1-4, a short between two output lines becomes undetectable or not. It is easy to see that with undetectable faults of types 1-4 the function of the output lines is not modified and hence shorts between two output lines remain detectable. The result is that the output lines block is SFS.Similarly, it may be shown that the block of bit lines is SFS. Concerning the block of product lines, the same is true of product lines checked by the product lines checker. However, no general demonstration is possible for the shorts between product lines not checked by the product lines checker (see above), and for shorts between product lines generating PP and product lines checked by the product lines checker. Therefore, every two nonchecked product lines will be separated by at least one checked product line (e.g. product line P1 in Figure 2) and one nonchecked product line (e.g. product line P2 in Figure 1) will separate product lines generating PP from the checked product lines. Thus the SFSproperty is ensured.

Design of self-checking folded PLAs The method detailed above requires checking of PLA product lines after they have crossed over the second (OR) matrix. However, if the OR matrix is folded 13' 14, the product lines do not cross over the entire OR matrix and therefore they cannot be checked. Another drawback of the presented method is that, very often, the number of PLA product lines is very high

284

and, as a result, a large parity checker is needed to check them. Furthermore, in folded PLAs many product lines are replicated to decrease the PLAs' area by folding the second (OR) matrix 13. This increases the number of product lines to be checked. These drawbacks are avoided when the self-checking design does not need to check the product lines. Such a scheme is given in Reference 9 using unordered codes. Here, another method which does not need to check the product lines is presented. This method, compared with the one given in Reference 9, decreases the number of check bits needed and simplifies the PLA checkers (XOR trees instead of adder trees). Unlike the scheme of PLAs checked by parity codes, in the present scheme replication of product lines in folded PLAs is not a shortcoming. On the contrary, replication of product lines is suitable since, as will be seen, it decreases the extra outputs needed for checking and the complexity of the checkers. Fault d e t e c t i o n b y the B e r g e r code c h e c k bits

The new scheme for PLAs is based on the study of the detection capability of the Berger code check bits. This study is more detailed than the one given in Reference 15, and minimizes the area overhead. Let B k _ 1 . . . . . B1 Bo be the check bits of the Berger code generated by counting the number of 'ones' in the information bits and by complementing the result. (The analysis will be similar in the case of check bits defined as the binary representation of the number of 'zeros' in the information bits.) Lemma 1: B0 is equal to the odd parity of the information bits, B1 is equal to the odd parity of the quotient of the division by two of the number of ones in the information bits. . . . . Bk - 1 is equal to the odd parity of the quotient of the division by 2k - 1 of the number of ones in the information bits. Knowing that the parity bit detects all errors having odd multiplicity, it can then be concluded from Lemma 1 that • B0 detects all unidirectional errors changing an odd number of pairs of information bits • Bk - 1 detects all unidirectional errors changing an odd number of pairs of information bits. • Bk - 1 detects all unidirectional errors changing an odd number of 2 k - 1 triplets of information bits. Table 2 gives the multiplicity of unidirectional errors affecting the information bits and detected by each of the check bits. Theorem 1 comes directly from Table 2. Use of the subset B0, B1. . . . . Brn- 1 of the Bergercode check bits allows detection of allunidirectional errors affecting only the information bits and having multiplicity < 2m - 1. It is obvious that Lemma 2: all errorsaffecting only the check bits Bo, B1. . . . . are detectable

Bm-

1

since for each binary value of the information part there is only one binary value of the check part. Lemma 3: Use of the subset Bo, B1. . . . . Bm- 1of the Bergercode check bit allows detection of all unidirectional errors affecting the check bits and the information bits simultaneously and having multiplicity < m. Lemma 3 is obvious for m = 1 (parity code) and is demonstrated in Reference 15 for m (E {2, 3}; the proof is similar for m > 4.

Microprocessors and Microsystems

Table 2.

Coverage of unidirectional errors affecting the information bits for each bit of the Berger code Covered t-unidirectional errors in information bits

Check bit B0 B1 B2

Bk - 1

1 2 4 2k - 1

3 6 12 3.2k - 1

5 10 20 5.2/< - 1

Design of self-checking folded PLAs using a subset of Berger code bits In this scheme the bit lines of the PLA are considered to be checked. As explained above, the checking of bit lines may be ensured, using the checker of the circuit generating the inputs of the PLA, in one of the ways given in Figures 2 and 3. Therefore, the checking of bit lines is ensured without using any extra circuitry. Checking of faults affecting the part of the PLA composed of the product lines and the output lines must also be ensured. To ensure this coverage generation of some supplementary outputs (B0, B1 . . . . . Bm- 1) giving a subset of the Berger code check bits is proposed. For this coding scheme, the output errors of the PLA must be unidirectional and must have the multiplicity given in the subsection above in order to be detectable. These conditions can be ensured, by means of the Rules 2 and 3 given above, in the following way: • Rule 2 is verified since the inversion parity of any path between each product line and the output lines is equal to one. Therefore, the output errors, due to the faults listed in Table I and affecting the part of the PLA composed of product lines and output lines, will be unidirectional. • Rule 3 may be refined by the following two subrules which ensure the appropriate multiplicity of errors. o Rule 3*: each product line generating only information output lines must be used to generate at most 2 m - 1 information output lines. o Rule 3**: each product line generating information and check output lines must be used to generate at most m output lines. (There are no restrictions for product lines generating only check output lines.) Using these restrictions, errors on PLA outputs produced by faults on product lines and on output lines are detected by check bits B0, B1. . . . . Bm- 1. On the other hand, the faults on bit lines are detected using the schemes in Figures 2 or 3. Thus the PLA is fault-secure for the fault hypothesis given in Table 1. For a given subset of Berger code check bits, the restrictions given in Rules 3* and 3** may or may not be verified by the original design of the PLA. In case they are not verified by some product lines, these product lines must be replicated and each replication will be used to generate a number of outputs respecting the Rules 3* and 3**. Replication of some product lines will increase the area of the AND matrix. However, in folded PLAs this replication can decrease the area of the OR matrix, since replication of product lines makes folding of output lines easier. Replication of product lines is already used in tools for automatic generation of folded PLAs. Product lines that are most likely to be replicated in these tools are

Vol 13 No 4 May 7989

7 14 28 7.2/< - 1

9 18 36 9.2/< - 1

11 22 44 11.2/< - 1

those driving a large number of outputs and this is also the case for product lines replicated to ensure Rules 3* and 3**. Therefore, this scheme may be used successfully to generate self-checking folded PLAs. This ensures the fault secure property so that all output errors due to a fault of a type given in Table 1 are detectable. However, the SFS property (which concerns the undetectable faults) has yet to be discussed. To ensure this property undetectable shorts of the types specified Sp and St previously must be avoided. In the present case undetectable shorts between a product line and an output line (collapse, after a short, of a product line and an output line will result in a line which does not verify Rule 2) and undetectable shorts between some products lines (collapse, after a short, between two product lines may result in a product line which does not verify Rules 3* or 3**) m ust be avoided. To ensure that such shorts do not occur, fault avoidance techniques may be used; for example, topological separation of some lines or implementation of some lines in layers such that shorts cannot occur during the circuit's life (see, for example, class I of the fault hypothesis6). Such techniques for avoidance of shorts may also be found in References 7, 8 and 16.

Output checker To simplify the output checker of the PLA a method for checking the PLA using XOR trees instead of adder trees is presented. From Lemma 1, it can be verified that B0 may be generated as the even parity of output lines, using a XOR tree. By the same Lemma, it is possible to verify that B1 may be generated using two-input AND gates receiving the inputs of the XOR gates generatingBo, and a XOR tree which will calculate the parity of the outputs of the AND gates to give B1. To explain this scheme Figure 4 is considered. If a couple of information bits (e.g./1,/2) driving a XOR gate (XOR1) takes the value (1, 1), then this couple of ones is detected by the corresponding AND gate (AND1) and is applied to the parity tree generating B-~~ is equal to the even parity of couples of ones in information bits). In this case, the output of the XOR1 gate is equal to zero and, therefore, this couple of ones will not be detected a second time by the AND3 gate. If the couple I1,12 takes the value (1, O) or (0, 1), then the output of the AN D1 gate is zero. However, the 'solitary' one in this couple will be propagated to the output of the XOR1 and afterwards through the following XOR gates until the place (inputs of a gate named XORi) where it meets another 'solitary' one to form a couple of ones. This couple of ones will be detected bythe ANDi gate and will be used as input of the XOR tree generating B1. Similarly, B 2 will be calculated

285

11

11

TB°-

12

13 14

12 13 14

I

Figure 8. Generation of Berger code check bits using XOR trees

Figure 5. Generation of Berger code check bits using AND and XOR trees

12

a

X

II

NO

12

X NO

b

Figure 6. a, Real and b, symbolic representation of NMOS and CMOS XOR gate

b

Figure 7. a, Real and b, symbolic representation of NMOS and CMOS exclusive NOR gate

XOR trees are used instead of adder trees. Another advantage of using checkers based on XOR trees is that it extends the work given in Reference 17 by ensuring the self-testing property using a small number of code word inputs. In this paper, it has been shown that for each parity tree there are several sets containing four input words, and the self-testing property is ensured when anyone of these sets is applied to the tree. To extend these results for trees such as the one given in Figure 8, it is sufficient to • start from the test sets of the XOR trees generating each one of the check bits • find the test sets for the global circuit using a backward tracing from the inputs of each XOR tree to the global inputs. Finally, the optimal number of check output lines depends on the PLA to be realized, but an analysis of many multiple folded PLAs suggests that this number is three or four for most cases. The general scheme of self-checking folded PLAs is given in Figure 9. D E S I G N OF S E L F - C H E C K I N G R O M S A N D R A M S

using AND gates receiving the inputs of XOR gates of the XOR tree generating B1 and so on. Figure 5 gives the generation of Bo, B1 and B2 for four information bits. In this figure, the AND gates may be suppressed using internal signals of the XOR gates. Figure 6a is a possible realization of the XOR function, using two gates (a NOR gate and a complex AND-NOR gate) which are directly realizable in NMOS and CMOS technologies. This realization also provides the NOR function without using extra circuitry. Figure 6b is a symbolic representation of the circuit in Figure 6a. Similarly, Figure 7a gives the exclusive NOR function and the NAND function; Figure 7b is the symbolic representation. Then, using the gates in Figures 6 and 7, the circuit in Figure 5 may be simplified as shown in Figure 8. This technique can be used to generate any subset of Berger code check bits corresponding to a given set of information bits. To conclude, the method given in this section is based on a more detailed analysis of errors detected by Berger code check bits than the one given in Reference 15. Analysis is combined with error propagation rules which allow detection of errors on the outputs of the PLAs.Thus, optimal designs for self-checking PLAs may be obtained. Compared with the method using parity codes, it has the advantage of being compatible with multiple folding design. Moreover, checking of product lines, which in some cases may be several times more numerous than output lines, is avoided. Compared with the method given in Reference 9, it decreases the number of extra outputs needed and simplifies the output checker, since

286

ROMs and RAMs are discussed together since they have a similar organization as shown in Figure 10. A self-checking

A

-- 'd ~J CO U

B

.b~ (J Q

t-

-s 0

Figure 9. Ceneral scheme for self-checking folded PLAs. A - - NOR matrix with replicated product lines; B - - NOR matrix with extra outputs giving a subset of Berger code check bits; C - - bit lines checker based on schemes in Figures 3 and 4; D - - output lines checker realized using XOR trees

Microprocessors and Microsystems

li

AI

"-,~ {" 0

Ak

unordered code generated respectively when only Li or only Li are equal to '1'. As stated in Theorem 2, Li ~ Li --~ Mi ~ M i. On the other hand, it is obvious that in the case of simultaneous activation of Li and Li, some outputs of NOR gates, i.e. those corresponding to the 'O's of Mi and those corresponding to the 'O's of M i will be set to '0'. Such a word is covered by Mi and M i and thus it does not belong to the unordered code. • If more than two bits of the one-out-of-n code are '1 's, it can be shown similarly that the outputs of the NOR gates give a noncode word for the unordered code.

Word array

• .

0 0 0 0 0 0 0

Ak+ I Am

_I

iI

MUX

-I

01 Figure 10.

0m

ROM and RAM block diagram

scheme for ROMs is given in Reference 18, where an unordered code is loaded onto the word array. In this paper it is the simplification of the code to be loaded that is of interest. This code may be, for example, the parity code needing a single check bit. For this code, Rule 1 must be verified to ensure the fault-secure property. This rule is verified by the lines of the word array and of the multiplexer (each cell of the word array and each line of the multiplexer is connected by only one output), but it is not verified by the outputs of the decoders. Therefore, the outputs of the decoders must be checked after crossing over the word array.

High fault coverage self-checking decoders A self-checking decoder offering high fault coverage is now presented. With n binary variables forming a one-out-of-n code, any single output function, having as input space the one-out-of-n code, may be realized easily by a NOR gate. The single output function is implemented in the following way: all the n variables will either drive a MOS of the NOR gate or they will not. Let xi be one of the n variables. If, for the code word of the one-out-of-n code, for which xi = '1', the output of the function is '0', then a MOS of the NOR gate is driven by xi. Otherwise, no MOS is driven by xi. Thus any multiple output function may be realized by a matrix of NOR gates. Theorem 2. Considerthe transposition of a one-out-of-n code to an unordered code, using a NOR-gate matrix, when two different code words of the one-out-of-n code are not transposed to the same code word of the unordered code. Then, all noncode words of the one-out-of-n code are transposed to noncode words of the unordered code. Proof: the noncode words of the 1-out-of-n code are such that either all bits are 'O's ortwo or more bits are '1 's. • If all bits are 'O's (inputs of the NOR gates), then the bits of the unordered code (outputs of the NOR gates) are all '1 's. Such a word covers all other binary words and then does not belong to an unordered code. • If two bits, Li and Li, of the one-out-of-n code are equal to '1', then let Mi and M i be the code words of the

Vol 13 No 4 May 1989

Note that Theorem 2 also holds when a NOT-AN D structure (equivalent to the NOR matrix), an OR matrix, or a NOTNAND structure (equivalent to the OR matrix) is used. Theorem 2 may be used to design checkers for the one-out-of-n code. Previous designs using transposition into a k-out-of-2k code 22 and using transposition into a double-rail code 19 are special cases of Theorem 2. Another possible design translates the one-out-of-n code to a Berger code, having N = [Iog2(n)] information bits and K = [log2 (N + 1)] check bits, and using a Berger code checker to check it. This checker will necessitate a lower number of MOSs than the previously proposed checkers. For example, when the Berger code checker and the double-rail checker cell 1 are used, the checker of the one-out-of-n code based on the Berger code transposition needs n(N + k)/2 + 23N - 1 3 K - 10MOSs, which is generally less than the (n + 1 2 ) N - 10MOSs needed for the checker given in Reference 19. For example, if k = 3, N = 7 and n = 128, the checkers need 752 MOSs and 970 MOSs respectively. Finally, the checker given in Reference 22 needs even more devices. The following Lemma ensures coverage of some other errors on a decoder's outputs. Lemma 4: Let A1, A 2. . . . . A N be the inputs of the decoder. If the signalsA1, A 2 . . . . . A N or ~-11,A22. . . . . ~NNare regenerated from the outputs L1, L2.... , Ln of the decoder (using a NOR gate matrix), and if the regenerated signals are compared with the inputs of the decoder, then all errors can be detected such that an output of the decoder is activated instead of another output. Proof: If the output Li (corresponding to the input word aj) is activated instead of the output Li (corresponding to the input word ai), then the regenerated word will be aj and the comparison with the input word ai will detect the error. Lemma 5: If any possible error on the outputs of a circuit is detectable then the circuit is strongly fault-secure for any type of faults (single, multiple, introducing sequential behaviour etc.). Lemma 5 is obvious since every output error is detected independently of the type of fault producing it. Although, faults may remain undetectable and new faults may occur, the errors produced by the accumulated faults are detectable. Similarly, when faults introduce a sequential behaviour, the error produced each time on the circuit's outputs will be detected. Theorem 3: If a decoder is checked according to Theorem 2 and Lemma 4, then it is SFS for any faulL ~,= Proof: It can be verified that the errors detected in Theorem 2 and Lemma4 coverall errors on outputs of the decoder. Then, according to Lemma 5, Theorem 3 is true. Theorem 3 may be applied in different ways: • Let A1, A2 . . . . . AN be the inputs of the decoder, then 2N NOR gates can be used to regenerate A1, A2 . . . . .

287

AI AN

....

i L~___a~ray

Double-rail checker

w-o;E - - -H

.o.

__H

-

AN

matrix

I1 I

Word .......... H

T

input~ ~

Unordered code

~~1

NOR

~,Q

_I D°ud'eke a" ,_

]

Double-rail or Berger code checker Figure 11. decoders

B ock leneratingl l~i the I" / decode r ~s~ ] . _

Figure 72. Self-checking scheme for decoders receiving an unordered input code

High fault coverage self-checking scheme for

_/ Addresses

A N and A1, A 2. . . . . A N. Two double-rail checkers are used, one to compare the outputs of the NOR gates (in order to ensure Theorem 2) and the other to compare At, A 2. . . . . A--Nwith the inputs of the decoder (in order to verify Lemma 4). • Use NOR gates to regenerate A1, A2 . . . . . A N and the Berger code check bits corresponding to A1, A2 . . . . . A N. Then a Berger code checker is used to verify the outputs of the NOR gates and a double-rail checker is used to compare 7C1,~2 . . . . . ~ with the inputs of the decoder. These possible implementations are represented in the checking scheme of Figure 11, ensuring a very high fault coverage for the decoder. Finally, the present method shows greatest advantage when the inputs of the decoder are encoded into an unordered code (Berger code, m-out-of-n code etc.). Note that in the case of an m-out-of-n code (which is not separable), the decoder may however be realized (see Reference 20) and needs less devices than in the case of a binary code. In this case (inputs encoded into an unordered code), NOR gates can be used to regenerate the encoded inputs of the decoder and a double-rail checker to compare the outputs of NOR gates with the encoded inputs of the decoder (see Figure 12). There are two advantages of this method: • the overhead is decreased since there is no need for a Berger code checker or for an m-out-of-n code checker to check the NOR gate outputs. • the checker that checks the block generating the inputs of the decoder is avoided. It can be verified that the decoder, the NOR matrix and the double railchecker give a code-disjoint and self-testing checker for the unordered input code of the decoder. Therefore, the scheme in Figure 12 offers a very high fault coverage for the decoder and only needs low overhead, since it checks the decoder and the block generating the decoder's inputs in one go.

Low overhead self-checking decoders The above scheme for decoders offers a very high coverage, but it may be found to be area consuming, especially when the input code is not an unordered one. Therefore, another scheme, offering a lower fault coverage but needing less area overhead, is presented here.

288

ROM / RAM Word array PA

POPE

Figure 13. Regeneration of even and odd parity for decoder's inputs (addresses)

The decoder is checked using only two NOR gates; one is used to regenerate the even parity (PE), the other is used to regenerate the odd parity (PO) of the decoder's inputs, as shown in Figure 13. The regenerated PE and PO bits 21 are compared with the parity bit PA (even or odd) associated with the decoder's inputs. This scheme has been developed in the framework of a project with the Thomson Laboratoire Central de Recherches, France. It is used to design a 100-kbit Cache memory including selfchecking and BIST capabilities21. The same scheme is also given in Reference 11. The decoder may be designed using either a NOR matrix or a NAND matrix followed by NOT gates. Figure 14 gives the NAN D-NOT design using precharged CMOS NAND gates. The errors on the outputs of a decoder may be classified as follows (1) all decoder outputs go to '0' (2) One (or more) output Li is selected in addition to the output Li normally selected (3) One (or more) output Li is selected instead of the correct output Li. Two cases can result from errors (2) and (3). (a) The lines Li and Li correspond to addresses having different parities or (b) the lines Li and Li correspond to addresses having the same parity. Lemma 6: Single faults affecting the inputs of the decoder, the input inverters,the bit lines or the serialMOSs of the NAN D gates can only produce errors of types 1, 2a or 3a. Proof: The reason for this is that errors 2b and 3b cannot occur as a result of faults listed in Lemma 6, because when Li is normally selected and the address correspondingto Li has the same parity as the one corresponding to Li, then in the gate generating Li there are at least two driving MOSs in the off state. Then, it is easy to verify that a single fault (given in Table 1) cannot force two MOSs of the same NAND gate from the off state to the on state. Lernma 6 may also be proven for decoders realized by a NOR matrix.

Microprocessors and Microsystems

PE PO

Error indication D~

I[

Figure 15.

t[

since, during the precharge phase, just before the occurrence of such an error, PE or PO will be equal to '0', while the correct state is PE and PO both at '1 '. Figure 15 gives a scheme that covers type 2b and 3b errors. During correct operation, the signal DO (DO = O) and the output of the AN D gate have opposite values (0,1 during precharge and 1, 0 during evaluation). However, during the precharge phase and just before a type 2b or 3b error occurs these signals will both be equal to '0'. To summarize, for the faults listed in Table 1, the scheme in Figure 13 covers type 1,2a and 3a errors and the scheme in Figure 14 covers type 2b and 3b errors. Therefore, for the fault model considered all errors are covered. In conclusion, according to the fault coverage and to the area overhead, one of the schemes given in this section may be chosen to check the lines decoder and the column decoder if any. Note also that the overhead for the schemes will decrease significantly when the block generating the decoder's inputs is checked by an unordered code. When the decoders are checked, the word array verifies Rule 1. Then any code may be used to check it, especially the parity code which needs a single parity bit. It results in a low cost scheme that ensures concurrent error detection in ROMs and RAMs.

IE

L \ L1

L2

Ln

Figure 14. NAND and NOR realization of the decoder using precharged CM05 gates

Lemma 7: All errors of types 1,2a or 3a are detected by parity bits PE and PO. Lemma 7 holds since in the case of a type I error PE and PO will both be equal to '1 '; in the case of a type 2a error PEand PO will both be equal to '0'; and in the case of a type 3a error PE and PO are equal to '01' (or'lO') instead of '10' (or '01 '). Some faults not considered by Lemma 6 may produce type 2b errors. Such faults are precharged line • cut (floating line) and stuck-at-one; precharged MOS (MOS P) stuck-open; a MOS P (MOS N) of an output inverter stuck-on (stuck-open); an output line of the decoder cut and stuck-at-one etc. The proposed scheme does not cover type 2b errors. This problem also appears in Reference 11. Such a scheme is unacceptable for applications that need an advanced level of safety, since faults giving type 2b errors are likely to occur. For example, as the length of the decoder outputs is often large, cutting of such a line is likely to occur. However, for decoders based on precharged gates, the complete scheme also covers faults giving type 2b errors. This is explained below. Lemma 8: If during an evaluation phase a type 2b or 3b error occurs, then, during the previous precharge phase, a line Li will be equal to '1' instead of '0'. Therefore, during the precharge phase PE or PO will be equal to '0' instead of '1'. To prove Lemma 8, it is assumed that during the precharge phase all lines Li are equal to '0'. Then, during the evaluation phase the only possible way to force a nonselected line Lj to '1' is to have all MOSs of the respective NAND gate in the on state. As explained in the proof of Lemma 6, this is possibly only if Li and the line Li normally selected correspond to addresses with different parities. Therefore, if during the precharge phase all lines are equal to '0' then error types 2b or 3b cannot occur. Lemma 8 gives a method to cover type 2b and 3b errors

Vol 13 No 4 May 1989

Scheme covering type 2b and 3b errors

CONCLUSIONS Several schemes ensuring concurrent error detection for PLAs, ROMs and RAMs and for a realistic fault model have been presented. These schemes are checked by simple codes and necessitate low area overhead. The various solutions proposed in the paper may be chosen according to various criteria, such as the area overhead, the fault coverage, or the automatic design and optimization.

REFERENCES 1 Carter, W C and Schneider, P R 'Design of dynamically checked computers' Proc. 4th Cong. IFIP Vol 2 Edinburgh, Scotland (August 1968) pp 878-883 2 Anderson, D A Design of self-checking digital networks using coding techniques CSL (Report 527) University of Illinois, Urbana, USA (September 1971) 3 Smith, J E and Metze, G 'Strongly fault-secure logic networks' IEEE Trans. on Computers Vol C-27 No 6 (June 1978) 4 Viaud, J and David, R 'Sequentially self-checking circuits' lOth InL Symp. on Fault Tolerant Computing Symp. Kyoto, Japan (October 1980) 5 Nicolaidis, M and Courtois, B 'Strongly code disjoint checkers' IEEE Trans. on Computers Vol 37 No 6 (June 1988)

289

6 Courtois, B 'Failure mechanisms, fault hypotheses, and analytical testing of LSI-NMOS (HMOS) circuits' VLSI 81 University of Edinburgh, Academic Press, London, UK (August 1981) 7 Nicolaidis, M and Courtois, B 'Layout rules for the design of self-checking circuit' VLSI Conf. Tokyo, Japan (August 1985) 8 Nicolaidis, M and Courtois, B 'Design of se!fchecking circuits using unidirectional error detecting codes' 16th Fault Tolerant Computing Syrup. Vienna, Austria (July 1986) 9 Mak, G P, Abraham, J A and Davindson, E S 'The design of PLAs with concurrent error detection' Dig. Pap. 12th InL FTC Symp. Santa-Monica, CA, USA (June 1982) pp 303-310 10 Warkely, I F Error detecting codes, self-checking circuits and applications Elsevier, North Holland, NY, USA (1978) 11 Chen, C Y, Fuchs, W K and Abraham, J A 'Efficient concurrent error detection in PLAs and ROMs' Proc. Int. Conf. on Computer Design Port Chester, NY, USA (October 1985) 12 Nicolaidis, M and Courtois, B 'Design ol~ selfchecking systems based on analytical fault hypotheses' IMAG Report No RR 353 (March 1983) 13 Chuquillanqui, S and Perez Segovia, T 'Paola a tool for topological optimization of large PLAs' Proc. 19th DAC Las Vegas, NV, USA (June 1982) 14 De MicheI, G and Sangiovanni-~ncenelli, A 'Multiple constrained folding of programmable logic arrays: theory and applications' IEEE Trans. Vol CAD-2 No 3 (July 1983) 15 Bose, B and Lin, D J 'Systematic unidirectional error detecting codes' 14th Fault Tolerant Computing Syrup. Kissimmee, Florida, USA (June 1984) 16 Jha, N K and Abraham, J A 'Totally self-checking MOS circuits under realistic physical failures' Proc. Int. ConL on Computer Design Port Chester, NY, USA (October 1984) 17 Khabaz, J and McCluskey, E I 'Self-testing embedded code checkers' Copcon '83 (1983) 18 Fuchs, W K and Abraham, I A 'A unified approach to concurrent error detection in highly structured logic arrays' Dig Pap. 14th Int. FTC Symp. Kissimmee, Florida, MI (June 1984) pp 4-9 19 Khabaz, l 'Totally self-checking checker for I-out-of-n

290

code using two-rail codes' IEEE Trans. Vol C-31 (July 1982) pp 677-681 20 Diaz, M and De Souza, J M 'Design of self-checking microprogrammed controls' Dig Pap. 5th Int. FTC Symp. Paris, France (June 1975) pp 137-142 21 Nicolaidis, M 'An efficient built-in self test scheme for functional test of embedded RAMs' IMAG Report No RR 477 (November 1984) 22 Anderson, D A and Metze, G 'Design of totally selfchecking circuits for m-out-of-n codes' IEEE Trans. Comput. Vol C-22 No 3 (1973) Michael Nicolaidis received an engineering doctorate from degree from the Polytechnic of Thessaloniki, Greece and an engineering doctorate from the Polytechnic Institute of Grenoble, France. Presently he is a researcher with CNRS, working at the TIM3/IMAG Laboratory in Grenoble. His research interests include ~ ........ fault-modelling fault tolerant computing self-checking systems, design for testability, and CAD tools.

Bernard Courtois received an engineering degree from the ~cole Nationale Sup~reure d'lnformatique et Math~matiques Appliqu~es de Grenoble, France in 1973 and engineering and science doctorates in 1973 from the Institut National Polytechnique de Grenoble, France. Since 1973, he has been researching fault tolerance, fault modelling and VLSI testing He is currently responsible for the Computer Architecture Group of the IMAG/TIM3 Laboratory, where research interests include CAD, architecture, and VLSI testing.

Microprocessors and Microsystems