Embedded fault diagnosis in digital systems with BIST

Embedded fault diagnosis in digital systems with BIST

Microprocessors and Microsystems 32 (2008) 279–287 Contents lists available at ScienceDirect Microprocessors and Microsystems journal homepage: www...

476KB Sizes 2 Downloads 44 Views

Microprocessors and Microsystems 32 (2008) 279–287

Contents lists available at ScienceDirect

Microprocessors and Microsystems journal homepage: www.elsevier.com/locate/micpro

Embedded fault diagnosis in digital systems with BIST Raimund Ubar, Sergei Kostin, Jaan Raik * Department of Computer Engineering, Tallinn University of Technology, Raja 15, Tallinn 12618, Estonia

a r t i c l e

i n f o

Available online 7 April 2008 Keywords: Built-in self-test Pseudorandom test sequence Signature analysis Fault simulation Diagnostic tree Fault diagnosis Diagnostic resolution

a b s t r a c t This paper presents an optimized fault diagnosing procedure applicable in Built-in Self-Test environments. Instead of the known approach based on a simple bisection of patterns in pseudorandom test sequences, we propose a novel bisection procedure where the diagnostic weight of test patterns is taken into account. Another novelty is the sequential nature of the procedure which allows pruning the search space. Opposite to the classical approach which targets all failing patterns, in the proposed method not all of such patterns are needed to be used for diagnosis. This allows to trade-off the speed of diagnosis with diagnostic resolution. To improve the diagnostic resolution multiple signature analyzers are used. A method is proposed to partition a single signature analyzer into a set of multiple independent analyzers, and the algorithms are given to synthesize an optimal interface between the outputs of the circuit under test and the analyzers. The proposed method is compared with three known fault diagnosis methods: classical Binary Search based on patterns bisection, Doubling and Jumping. Experimental results demonstrate the advantages of the proposed method compared to the previous ones. Ó 2008 Elsevier B.V. All rights reserved.

1. Introduction During the Integrated Circuits (IC) design and manufacturing cycle, a manufacturing test screens out the bad chips. Fault diagnosis is used to find out why the bad chips failed, which is especially important when the yield is low or when a customer returns a failed chip. Understanding how ICs fail helps to identify and eliminate the causes of failures. Unlike board-level diagnosis, the main objective in diagnosing ICs is to understand the failures (to specify the fault or erroneous state) and to prevent them from recurring, not to repair the faults. Locating faults in chips and analyzing failure trends can lead to corrective design rule changes. Failure trends also help to reveal process and manufacturing problems. Diagnosis can identify reliability trouble spots in a design and lead to special Design for Reliability (DFR) actions, and in such a way it can have an impact on future designs [1]. Diagnosis of the designs is especially important at the early stages of the production cycle when yield improvements need to be rapidly attained. Such yield improvements can only be achieved if diagnosis of the failing designs can be performed with utmost haste and automation to ensure fast pinpointing of problems related to the design and/or process. As process technologies shrink and designs become more complex, Built-In Self-Test (BIST) is gaining increasing acceptance as an industry-wide test solution [2], because it provides a low-cost * Corresponding author. Tel.: +372 6202252; fax: +372 6202253. E-mail addresses: [email protected] (R. Ubar), [email protected] (S. Kostin), [email protected] (J. Raik). 0141-9331/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.micpro.2008.03.006

solution to both test generation and test application [3], offers the promise of low hardware overhead with the clear advantage of at-speed testing (delay testing) [4], enables testing at the operation frequency of circuits under test (CUT) and reduces the amount of data that needs to be transferred from the Automatic Test Equipments (ATE) to the DUT. Despite such benefits, the BIST approach has not been adopted as the primary test methodology, because of weak diagnostic capabilities [3]. A problem with it is that the signature provided by Output Response Analyzer (ORA) at the end of the test session does not contain enough diagnostic information, either to identify failing vectors or to precisely identify error-capturing scan cells. The pass/fail information obtained from ORA is usually insufficient to diagnose the failure via effect-cause analysis [5]. Thus debug in BIST environment is complicated. The need for debug/diagnosis capability is essential not only to reveal the process and manufacturing problems. After the design is fabricated on silicon for the first time, a substantial amount of effort is put on debugging what is popularly known as the ‘‘first silicon”. Such debug efforts usually weed out problems in the design, including design errors and design marginality. Field diagnosis is the third important phase where diagnosis capability is required [6]. To ensure its overall success, a BIST environment must be able to provide similar diagnostic capability as a conventional scanbased external testing environment. Any new method proposed to eliminate this drawback of BIST technique is faced with following challenges in the diagnosis process to be solved [4]: – achieving full diagnostic information, i.e. detecting all faulty items (whether scan cells, test vectors or logic blocks);

280

R. Ubar et al. / Microprocessors and Microsystems 32 (2008) 279–287

– minimizing diagnostic time, since this translates to reduction in total testing time; – minimizing hardware overhead, i.e. the amount of hardware needed to support Built-in- Self-Diagnosis (BISD). As logic BIST is increasingly being adopted to improve test quality and reduce test costs for rapidly growing designs. The search for efficient methods for BISD that provide the same benefits as BIST is relevant. The main objective of the current work is to carry out a comparison of available diagnosis algorithms, and to propose a new one which exploits optimized bisection strategy based on using diagnostic information inherent in test patterns. The rest of the paper is organized as follows. In Section 2 we give an overview about different diagnosis strategies and algorithms. Section 3 is devoted to the description of our proposed diagnostic search method based on sequential bisectioning of faults instead of patterns. In Section 4 an example is given to explain the proposed method. Then in Section 5 the idea of using multiple signature analyzers to improve the diagnostic resolution is introduced and in Section 6 the algorithms to implement the idea are presented. Section 7 discusses experimental data and in Section 8 the conclusions are drawn.

2. Overview of the state-of-the-art 2.1. The strategies for diagnosis Diagnostic analysis is based on two main principles: cause-effect or effect-cause approaches. Cause-effect analysis [7] is based on precomputed fault dictionaries whereas effect-cause analysis [8,9] is based on processing the test responses of a circuit (effect) in order to locate the fault (cause). Effect-cause analysis, while not using expensive fault dictionaries, requires, however, significantly higher computational power during diagnosis compared to the cause-effect approach. For every failing circuit, an expensive diagnostic simulation of the circuit has to be processed. The test responses have to be analyzed by using fault simulation and backtracking to identify the causes [8,10–12]. Computational and storage requirements for fault dictionaries are high due to the large number of faults, outputs and test patterns. Reduction in size is possible by storing only subsets of the dictionary, at the expense of reduced diagnostic resolution [13,14]. A combination of the two approaches, cause-effect and effect-cause analysis can have a synergy to overcome the prediagnosis and postdiagnosis simulation costs. Early works on BISD have focused on extracting diagnostic information hidden in the BIST signature based on identification of fault-detecting test vectors [15–17]. However, if there are faults that are detected by a number of vectors, aliasing problems make it impossible to place an accurate diagnosis [16]. Therefore, the researchers have attempted to collect more information by repeating the same test by adjusting signature analyzer parameters or observation outputs or by increasing signature register size [18– 21]. However, the previous attempts based on the analysis of a single signature have failed to provide effective methods for current designs [1]. Research on identification of fault-embedding scan cells in scan-based BIST design has concentrated on increasing the diagnostic information through multiple applications of the same test, each time modifying the way that the test responses are compacted. Observation of the test responses can be modified by either changing the outputs to be observed through partitioning schemes [1,22], or changing the signature compactor [21]. The approaches based on multiple repeating of full test sequences lead to long

diagnosis procedures, and the control circuits needed for repeated partitioning or repeated changing of signature analyzers lead to higher area overhead. Another approach is to organize the diagnostic procedure as a sequence of selected test sequences with the goal to identify failed test patterns. These approaches are using cause-effect strategy and are implemented as a classical search procedure which can be optimized to reduce the total diagnosis time. The drawback of this type of known approaches is the high amount of memory space needed to store the precomputed fault tables. In this work we propose for fault diagnosis in BIST environments a cause-effect method using compressed fault tables to minimize the amount of memory space needed. To compensate the loss in diagnostic resolution because of the reduced fault tables we propose using partitioning of the signature analyzer into a set of independent analyzers. However, this partition is done only once in a single way for the whole diagnosis procedure, and in this way it does not bring any additional increase in the area overhead compared to the case of a single signature analyzer. In this way, we combine simultaneously many techniques to reach the best synergy. Because of using the fault table, our method classifies as a cause-effect approach, and because of organizing the diagnosis as a sequential search based on intermediate fault analysis during the adaptive search, our method has also elements of effect-cause strategy. The ultimate goal of the proposed method is to optimize diagnostic procedure by minimizing the number of diagnostic test sessions (queries) at the accepted diagnostic resolution. The sequential essence of the procedure gives the possibility to stop at any step as soon as the accepted diagnostic resolution is achieved. The idea of using multiple signatures improves in general the resolution, and in this way helps to achieve the accepted diagnostic resolution by shorter diagnostic procedures than in the known approaches. 2.2. Search procedures for diagnosis In the following we give an overview of diagnostic algorithms in the connection of using in BIST. Consider the BIST environment consisting of the Circuit Under Test (CUT), the Pseudorandom Test Pattern Generator (TPG) and the Multiple Input Signature Register (MISR) as an Output Response Analyzer (ORA) as depicted in Fig. 1. Denote by N the length of the pseudorandom test sequence T generated by TPG, by F the set of possible faults in the CUT by F(t)  F the set of faults detected by the test pattern t 2 T, and by

Test PatternGenerator (TPG)

BIST Control Unit

...... Circuit Under Test (CUT)

Test patterns Number Signature Faults

...... Output Response Analyser (ORA)

............ ............ ............ ............ ............ ............ ............ ............

Fig. 1. BIST for fault diagnosis.

............. ............. ............. ............. ............. ............. ............. .............

....... ....... ....... ....... ....... ....... ....... .......

R. Ubar et al. / Microprocessors and Microsystems 32 (2008) 279–287

T(f)  T the set of test patterns failed because of the fault f 2 F. Let us call by a test session (query) the procedure where a part of test sequence T is applied with the subsequent comparison of the signature in MISR with the expected reference value. The diagnosis problem can be formulated as follows: given a set F of faults, identify the subset of faults F*  F, where in general case, the number of faults to be localized d = jF*j is unknown, using the minimum number of queries. (The number of queries is directly proportional to the amount of time needed to diagnose the BIST system.) Below we discuss four algorithms: Binary Search, Digging, Doubling and Jumping that attempt to solve the diagnosis problem. 2.2.1. Binary search The classical Binary Search algorithm is based on bisection, and a large variation of this approach has been published [23–26]. Consider here the case d = 1 where the circuit contains a single fault f* 2 F. The following procedure describes how to find all the failing patterns T(f*)  T within the pseudorandom sequence [1, N] of test vectors T generated by BIST, if after N patterns the signature is corrupted [27]: Algorithm 1 1. 2.

Perform BIST for all patterns within [1, N/2] IF the signature after N/2 patterns is correct: Find all the failing patterns within [N/2 + 1, N] ELSE Find all the failing patterns within [1, N/2]. Load the correct seeds for the pattern N/2 + 1 into the TPG and MISR. Find all the failing patterns within [N/2 + 1, N].

After the set of all failing test patterns T(f*) is determined we can calculate the set of faults \ [ Fðf  Þ ¼ FðtÞ  FðtÞ t2Tðf  Þ

281

the fault within a sequence of 2i test patterns using binary search, which requires i queries. Consequently, in a general case, the algorithm uses 2i + 1 queries and detects 2i items (2i  1 fault free and 1 faulty). This Doubling algorithm is presented in [4,30]. An interesting modification to Doubling is Jumping [31]. Here the test sets of sizes 1+2, 4+8, . . . , 2i + 2i+1 are used until a faulty set is found. Using these ‘‘jumps” in the ordered test sequence, the algorithm identifies fault-free subsequences with i/2 tests instead of i tests. However, a faulty test subsequence is of size 3  2i, rather than of size 2i as in Doubling; it therefore requires more than one query on a subset of size 2i to reduce the faulty set to either 2i or to size 2i+1 with 2i fault-free items. More detailed analysis of Jumping algorithm is presented in [4,31]. The effectiveness of the presented methods mainly depends on the number of failing patterns and their location in the sequence of test patterns. Thus in some cases one algorithm is better than other and conversely. In [32] we have proposed a modification of the Binary Search driven by bisectioning of fault coverage instead of the classical bisectioning of patterns [23–26]. It was already shown in [32] that the new method outperformed the known ones in the average length of the diagnostic procedure. However, there is still a motivation for further investigations because the pseudorandom essence of BIST is not providing high diagnostic resolution. In this paper we present a further improvement of the diagnostic procedure by using multiple signature analyzers to increase the resolution and to reduce the test length. We have implemented the algorithms of Binary Search, Doubling and Jumping to be used in the BIST environment for comparing with our proposed search algorithm described in the next section. Since we are considering in this paper only the case d = 1 (presumption of single faults), the Digging algorithm was not implemented. In fact, indirectly the main idea of the Digging algorithm about successively concentrating the search on current sets of suspected faults, is covered by our proposed algorithm.

t2TTðf  Þ

containing the suspected faults which cannot be distinguished from f*. 2.2.2. Digging The digging algorithm can be considered as an improvement to the Binary Search. Digging reduces the number of queries, especially for low values of d (number of faults in the CUT) [28,29]. Observe that if there are two sets of suspected faults F1 and F2, with F1  F2, then the result of the query F1 renders the result of the query on F2 useless. Hence, with Binary Search there is a potential for many queries to produce no additional information for the diagnosis process. This suggests that once a suspected set of faults F(f*) is found, the searchable fault f* should be identified from this particular set F(f*). This process is referred to as Digging [29]. In general case with d > 1, once a fault f* is identified, f* is removed from F*, and digging is resumed on the remaining items. Digging requires d  log2 n queries. 2.2.3. Doubling and jumping Given that the value of d is unknown, the Doubling algorithm attempts first, to estimate it. If d is small then the algorithm finds large fault-free sets; otherwise, the algorithm finds small suspected fault sets. To deliver this functionality, the algorithm tests disjoint sets sizes 1, 2, 4, . . . , 2i until a suspected fault set is found. At this point, the algorithm has used 2i  1 test patterns with positive results and has identified a fault by using a set of test patterns of size 2i, using i + 1 queries. The algorithm then identifies

3. Proposed algorithm of bisectioning detected faults In Fig. 1, a BIST based architecture is presented which we use to conduct the embedded fault diagnosis. The environment consists of TPG, ORA, memory for storing diagnostic data, and control unit. The proposed method is based on sequential bisection of the pseudorandom test sequence controlled by diagnostic data in the BIST memory. In other words we are using the idea of bisectioning of faults instead of bisectioning patterns as used in former methods. Selected patterns in the test sequence are serving as diagnostic points (DP). The number of DPs is determined by trade-off between the cost of memory and diagnostic resolution. In this paper these patterns are selected as DPs which detect new faults not yet tested by previous patterns. Diagnostic data (DPs) in the memory consist of: – the numbers j of selected test patterns tj; – signatures s(tj) corresponding to the content of ORA if tj would be the final pattern of the test session, and – the sets of faults F(tj) detected by the patterns tj. For each pattern tj we calculate the set of faults F(Tj) detected by the test sequence Tj with final pattern tj as [ FðT j Þ ¼ Fðtj Þ t j 2T j

and the fault coverage reached by the test sequence Tj with final pattern tj as

282

R. Ubar et al. / Microprocessors and Microsystems 32 (2008) 279–287

FC j ¼ jFðT j Þj=jFj where F is the set of all possible faults detected by the test sequence. Algorithm 2 Initial states: Initial fault table {F(tj)} and total fault table {F(Tj)}; SUSPECTED faults = ALL faults in F; START = 1 (the first pattern); STEP = 50%; END = 50% (a pattern that discloses together with preceding ones 50% of faults); While (the end of sequence of test patterns is not reached) //(1) find the FAILED pattern While (FAILED pattern not found AND the END of the sequence of test patterns not reached) %STEP = %STEP/2; Load correct seeds for test pattern START into the TPG and the ORA; Perform BIST for all patterns within [START, END]; If (the SIGNATURE is correct) Set START = END + 1; Set END = %STEP + %END; Else Set START = START; Set END = %STEP + %START; // (2) diagnose If (FAILED pattern found) SUSPECTED faults = SUSPECTED faults \ faults detected by FAILED pattern; Else// no more FAILED patterns SUSPECTED faults = SUSPECTED faults – faults detected by PASSED patterns; BREAK;// the END of sequence of test patterns is reached // (3) generate new {F(tj)} and new {F(Tj)}, where j-index new {F(tj)} [j] = initial {F(tj)} [j + FAILED pattern index +1] \ SUSPECTED faults; new {F(Tj)} [j] = new {F(tj)} [j] [ new {F(Tj)} [j-1]; START = 0; END = 50%; In Algorithm 2, STEP – 0 . . . 100% defines the interval of tested patterns used for the diagnostic test session (query) where % means percentage of fault coverage; START, END represent the pattern’s index in the test sequence; %START, %END represent the percentage of fault coverage (a percent of faults that is tested by these patterns and preceding ones). By Algorithm 2 a diagnostic tree (DT) can be created where the nodes represent test sessions (Figs. 2 and 3). Each path in DT represents a diagnostic procedure as a sequence of test sessions. For fault diagnosis, in fact, the explicit full diagnostic tree is not needed. For the fault location, only a single path of such a tree should be created and carried out according to Algorithm 2. The

Fig. 3. Secondary diagnostic trees for c17.

last node of the path corresponds to a failing pattern tj which allows to determine the suspected faults as the result of fault diagnosis: D ¼ Fðt j Þ  FðT j  tj Þ: If the diagnostic resolution jDj is acceptable we can finish the procedure, otherwise we will continue the fault diagnosis taking into account the knowledge of D in further test sessions with other subsequent parts of the pseudorandom sequence. In this procedure we will use the same bisection algorithm based on the updated by D fault coverages of the patterns involved in test sessions. The search for new failing patterns to improve the current diagnosis D (to reduce the number of suspected faults in D) proceeds until either all failing patterns are found or acceptable diagnosis resolution is achieved. The main difference of the proposed method compared to [27] is in searching and processing only a part of all failing patterns to reach still acceptable resolution. 4. Example of bisectioning detected faults In the following example, we will describe the problem of diagnosis as a set of possible diagnostic procedures in a form of diagnostic tree. If the full diagnostic tree is given we can calculate the average length of the diagnostic procedure (the number of test sessions or queries). We also demonstrate on this small example how the average length of the diagnostic procedure can be reduced by the proposed method of bisectioning faults compared to the former method of bisectioning test patterns. Consider as an example the circuit c17 from the ISCAS benchmark family. The numbers of detected faults with achieved fault coverages and the vectors of detected faults for all the 10 test patterns of the sequence are shown in Tables 1 and 2, respectively. In this example we consider all the test patterns as DPs. The columns in Table 2 correspond to faults and the rows to test patterns whereas the following notation is used: x – no faults, and 0 (1) – stuck-at-0 (stuck-at-1) fault detected by the test pattern. In Fig. 2 the diagnostic trees for comparing the classical binary search with bisection of patterns (Fig. 2a) and the proposed algorithm with bisection of detected faults (Fig. 2b) are presented. From each node in the trees we proceed to the left if a fault is detected by the corresponding test session, and to the right in the opposite case. The trees allow calculating the length of the diagnostic procedure. For example, the path through nodes 2, 6, 5,

Fig. 2. Diagnostic trees for c17.

283

R. Ubar et al. / Microprocessors and Microsystems 32 (2008) 279–287 Table 1 Simulation data for ISCAS circuit c17 #

# Faults

# New faults

Fault coverage (%)

1 2 3 4 5 6 7 8 9 10

5 15 16 17 20 21 25 26 29 30

5 10 1 1 3 1 4 1 3 1

16.7 50.0 53.3 56.7 66.7 70.0 83.3 86.7 96.7 100.0

Table 2 Fault table for ISCAS circuit c17

Thanks to taking into account the achieved diagnostic information for bisection of the test sequence, the average length of the secondary diagnostic procedure in Fig. 3 reduces from 12.4 to 4.0 in the case of the proposed method compared to the classical method. 5. Using multiple signature analyzers to improve the resolution The diagnostic resolution achieved by the procedure described in Section 3 can be improved by introducing multiple signature analyzers into the BIST architecture. Assume the circuit under test with a set of faults F has n outputs where each output i may be influenced by a subset of faults Fi  F. Introduce m, 1 < m 6 n, signature analyzers (SA) which should be connected to the outputs of CUT. An example of such a BIST for fault diagnosis with three SAs is shown in Fig. 5. Denote by Ij the set of outputs of CUT connected to the signature analyzer SAj. Depending on the faults detected by the outputs Ij, each SAj may be influenced by the following subset of faults: [ Sj ¼ Fi: i2Ij

Coding: X X X 0 1 1 & & & a X 0 1 0 1 0 X 0 1 b X 0 1 X X 0 X X X a∧b

Fault Code x 00 0 01 1 10 & 11

Fig. 4. Updating the diagnostic data.

4, 3 in Fig. 2b corresponds to five test sessions with total length of 2 + 4 + 3 + 2 + 1 = 12 clock cycles. The average lengths of the diagnostic procedures are 14.33 and 12.03 for the classical search by bisectioning of patterns and for the proposed method of bisectioning of detected faults, respectively. The numbers at the outputs of the leaves on trees correspond to the diagnostic resolution achieved by this particular procedure. Consider the case of bad resolution 10 in Fig. 2. The second row in Table 2 corresponds to this diagnostic result where 10 faults remain suspected. To improve the resolution we continue the diagnosis according to the trees as depicted in Fig. 3. In the case of bisection of patterns (Fig. 3a) we define the next DP by bisection of the reminder set of DPs (from 3 to 10), and start with the test session 5. For the proposed method (Fig. 3b), the rows from 3 to 10 in Table 2 are updated as shown in Fig. 4 where a is the current row to be updated, and b is the vector of suspected faults (row 2 in Table 2). After updating the Table 2, only two DPs remain (test patterns 3 and 5) where the suspected faults can be detected. In Fig. 4, the following notation is used: x – no faults, 0 (1) – stuck-at-0 (1) fault, and & – both faults detected by the test pattern.

In other words, if there is a fault f 2 F in CUT, this fault will be detected by all signature analysers SAj where f 2 Sj. As an example, the fault in CUT highlighted in Fig. 2 can be detected via two output lines by SA1 and SA2. Introduce for a set of m signature analyzers a codeword Cj as a sequence of bits Cj = (c1, c2, . . . , cm), so that the index j represents the decimal value of the binary codeword Cj. Represent by Cj the result of testing, so that ci = 1 when the signature analyser SAi has detected a fault, and otherwise, ci = 0 when no faults has been detected by SAi. The case when no faults has been detected by a set of signature analyzers {SA1, SA2, . . . , SAm} corresponds to the codeword C0. For any other codeword Cj, j 6¼ 0, we can state a diagnosis as a subset of suspected faults: \ [ Si  Si  F Dj ¼ F 1  F 0 ¼ i:ci ¼1

i:ci ¼0

1

where F is the intersection of subsets Si of faults tested by SAs with erroneous signatures (ci = 1), and F0 is the union of subsets Si of faults tested by SAs with correct signatures (ci = 0). It is evident that for all i 6¼ j, 1 6 i, j 6 2m  1, Di \ Dj = ;, and [ Dj ¼ F: j¼1;2;...;k

In Fig. 3, seven intersections of fault sets are shown to illustrate the fault diagnosis by three signature analyzers. For example, if a fault is detected by analyzers SA1 and SA2, the codeword C3 = (011) will be produced, which corresponds to the subset of suspected faults

Test pattern generator

Fault

CUT

SA1

SA 2

SA3

Fig. 5. BIST with multiple signature analyzers.

284

R. Ubar et al. / Microprocessors and Microsystems 32 (2008) 279–287

D3 ¼ S1 \ S2  S3 as the result of diagnosis. It is evident that the best diagnostic resolution will be achieved when the suspected subset of faults for any result of diagnosis, i.e. for any codeword Cj will be minimum. From this statement the following task can be formulated to find the best interface between the outputs of CUT and the set of signature analysers by connecting the SAs to CUT in such a way that     jFj jFj 8j; j ¼ 1; 2; . . . ; 2m  1 : m 6 jDj j < m þ1 ð1Þ 2 1 2 1 Here dxe denotes the largest integer that is less than or equal x. In the ideal case, provided that 2mjFj1 is integer, we should reach the situation where D1 ¼ D2 ¼    ¼ D2m 1 ¼

jFj : 2m  1

6. Design of the interface between CUT and SAs

END 6. Find j*, so that Dj* = min Dj where j = 1,2, . . ., 2i 1. 7. Assign j* to SAi, Si = Si [ Fj*. Remove j* from OUT. 8. If i < m, go to 4, otherwise END. To engage right in the beginning into the intersection procedure as many fault as possible, it is reasonable to start the Algorithm 3 with assigning to the first SA the output of CUT with the largest set of detected faults (Steps 1–3). In Step 4, we have as a current solution where i outputs have been assigned to i different SAs, so that the average distance Dj from the ideal diagnostic resolution is minimal. In Steps 5 and 6 we choose the next output to be assigned to the next SA, so that the average distance Dj from the ideal diagnostic resolution will be minimal. The algorithm is finished when to all SAs an output of CUT is assigned (connected). The goal of the Algorithm 4 is to assign the remaining outputs to SAs in the way that that the average distance Dj from the ideal resolution will be minimal to reach the best resolution for all of faults in CUT. Algorithm 4

We have developed algorithms of designing the best interface between CUT and the set of signature analyzers so that the condition (1) is as closely as possible satisfied. The interface is constructed as a procedure where the outputs of CUT are assigned to SAs step by step in such a way that in each step the condition (1) is satisfied as closely as possible. To give the words ‘‘as closely as possible” a countable meaning introduce the following notions – ideal size of the set Dj measured as DIDEAL ¼ 2mjFj1, – distance of Dj from the ideal size measured as Dj ¼ DIDEAL  jDj j:

Algorithm 3 1. Order the set of outputs OUT of CUT so that jFij P jFi1j, i = 2, 3, . . . , n. Take all Sj = ;. Take j = 1. 2. Take the first SAi, i = 1. 3. Assign j to SAi, Si = Si [ Fj. Remove j from OUT. 4. Modify i = i + 1.Take the next SAi. 5. Calculate for all j 2 OUT: BEGIN i [

Sk ;

k¼1

Dj;IDEAL ¼

jFðjÞj 2i  1

Calculate for all Ck, k = 1, 2, . . . , 2i 1: \ [ Sh  Sh ; dk ¼ Dj;IDEAL  jDk j Dk ¼ h:ch 2C k ;ch ¼1

Dj ¼

Si ¼ Si [ F j ; Calculate for all Ck, k = 1, 2, . . . , 2m 1: BEGIN \ [ Dk ¼ Sh  Sh ; dk ¼ Dj;IDEAL  Dk h:ch 2C k ;ch ¼1

h:ch 2C k ;ch ¼0

END

In the ideal case the fault resolution, i.e. the number of suspected faults is for all faults equal and minimal. Practically, we should strive to the situation where the average number of suspected faults for all faults will be minimal, i.e. the average of Dj should be minimized. The whole procedure of designing the interface consists of two parts. In the first part (Algorithm 3), to each SA an initial output of CUT is assigned. In the second part (Algorithm 4), on each step an arbitrary output of CUT is selected, and a SA is found as the best solution to be connected to the selected output of CUT. Consider only the practical situations where the number of outputs n of CUT is much greater than the number m of SAs, m  n, and n 2. In the extreme case of m = n, we have the situation where each output of CUT is separately observable, and no SAs is needed.

FðjÞ ¼ F j [

1. Take the next j from OUT. S FðjÞ 2. Calculate: FðjÞ ¼ F j [ m k¼1 Sk ; Dj;IDEAL ¼ 2m 1 3. Calculate for all Si: BEGIN

X

k¼1;2;...;2i 1

k:ch 2C k ;ch ¼0

dk

Di ¼

X

dk

k¼1;2;...;2i 1

Restore Si: Si = (Si  Fj) [ (Si \ Fj) END 4. Find i*, so that Di* = min Di, i = 1, 2, . . . , 2m 1. 5. Assign j to SAi*, Si* = Si* [ Fj. Remove j from OUT. 6. If OUT 6¼ ;, go to 1, otherwise END. Differently from Algorithm 3 where a selection of an item was made from the set of outputs of CUT for a given SA, in Algorithm 4, a selection is made from the set of SAs for a given output of CUT. In Step 2, a set of all faults F(j)  F will be calculated which are detected by the outputs currently already connected to SAs and by the output selected in Step 1 for connection. In step 3, the average distance Di from the ideal diagnostic resolution for all the SAs, supposed that the selected output j is connected to SAi, is calculated. In Steps 4 and 5 the best connection between the output j and the SAs is decided. The Algorithm 4 is finished when all the outputs of CUT are connected to SAs. The Algorithms 3 and 4 are targeting the optimized interface maximizing the diagnostic resolution achieved at each step of the diagnostic procedure. The algorithms are based on the greedy technique. 7. Experimental results Experiments were carried out on the ISCAS’85 benchmark circuits using Turbo Tester toolset [33] for generating pseudorandom test patterns and for fault simulation. The general data of the circuits and test sequences are presented in Table 3. ‘‘All” means the full length of the pseudorandom

285

R. Ubar et al. / Microprocessors and Microsystems 32 (2008) 279–287 Table 3 Experimental data for the proposed method

SA1 SA2

Circuit

C432 C499 C880 C1355 C1908 C2670 C3540 C5315 C6288 C7552

# Patterns

# Faults

All

Eff

223 1373 2692 1438 4420 22,862 9631 1793 42 24337

65 100 108 113 175 116 249 214 28 309

573 1194 994 1610 1723 2328 3149 5364 7693 7684

Diagnostic resolution

Test length

Av

W

Av

Max

4.1 2.3 1.8 3.1 2.9 4.1 2.2 2.2 3.3 2.6

10 5 6 6 16 45 28 13 8 14

11.3 44.0 27.1 43.9 62.8 37.2 53.9 45.4 17.4 105

55 92 71 100 168 120 203 170 26 260

D3 D2

D1 D7 D5

D6

SA 3

D4

Fig. 6. Diagnosis by the set of three signature analyzers.

Table 4 Comparison of different methods Method

(1) (2) (3) (4) (5) (6)

Mult_SA_short Mult_SA Bisect_faults [32] Doubling [30] Jumping [31] Bisect_patterns {[23–26]

Number of test sessions

Diagnostic resolution

Average

Max

Average

Worse

12 53 63 90 90 143

18 182 183 255 443 1001

2.7 2.4 2.4 2.4 2.4 2.4

22 22 22 22 22 22

test sequence, ‘‘Eff” means the number of ‘‘efficient” patterns which detect new faults not yet tested by previous patterns (only these patterns are used as diagnostic points DP involved in the diagnostic analysis), and ‘‘# Faults” means the number of possible stuck-at faults in the circuit to be diagnosed. The average (Av) and worse (W) diagnostic resolutions (the numbers of suspected faults in D) are calculated over all possible diagnosis results (over all possible faults in the circuits) that can be achieved for the given circuits by the given pseudorandom test sequences. The ‘‘Test length” (average and maximum) is the number of queries needed for fault diagnosis. The best diagnostic resolution for all circuits was 1. We compared the following algorithms: classical Binary search (6) [23–26], doubling (4) [30], jumping (5) [31], our previous method (3) of bisectioning of detected faults [32], the proposed method (2) based on multiple SA-s (Mult_SA) where five analyzers were used, and the proposed method (1) where after detecting 5 failed patterns the procedure stopped (Mult_SA_Short). For the methods (2)–(6) all failed patterns were used for diagnosis. The results (averages for the ISCAS circuits c2670, c5315 and c7552 over all faults) are depicted in Table 4. We see a considerable improvement in the speed of diagnosis for the proposed method of bisectioning faults (3) compared to the previous methods (4), (5) and (6). The methods (2)-(6) made the diagnosis on the basis of all failing patterns. This is the reason why the diagnostic resolution is equal for all of the methods. Using multiple signature analyzers (2) did not reduce much the test length 53 compared to the proposed method (3) of bisectioning faults with test length 63. The reason was that all the failed test patterns were targeted. Also the multiple SA-s did not have any impact on the diagnostic resolution because of finding all the failing patterns already provided all the needed information which was enough to achieve the best fault resolution at the given test sequence. The real dramatic impact of the multiple SAs emerged when we allowed stopping the procedure at the reduced number of failed test patterns (Mult_SA_short). In the experiment we stopped the procedure after finding 5 failed patterns. The test length 12 of

Average resolution

c2670

46 44 42 40 38 36 34 32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0

1

2

3

c5315

4

c7552

5

6

7

8

Failed patterns Fig. 7. Dependency of the resolution on the test length.

Table 5 Influence of the test length on the resolution # Failed patterns

C2670

c5315

c7552

Res

Length

Res

Length

Res

Length

1 2 3 4 5 6 7 8 All

27.5 7.1 4.4 3.9 3.7 3.7 3.5 3.5 3.3

5.3 7.7 9.1 9.9 10.6 11.2 11.7 12.3 29.5

28.8 4.2 2.4 2.1 2.0 2.0 2.0 2.0 1.9

5.8 8.9 10.2 10.9 11.6 12.2 12.8 13.4 38.9

45.1 7.4 3.4 2.7 2.4 2.3 2.2 2.2 2.0

6.1 9.1 10.6 11.5 12.4 13.1 13.8 14.6 88.0

the diagnostic procedure decreased now in average 5 times compared to the method (3) with test length 63 while the diagnostic resolution achieved was almost the same as for other methods. The strategy Mult_SA_Short is efficient because it exploits the improved resolution effect of multiple SA-s already at the beginning stage of the procedure, which allows to stop it after detecting only a small fraction of failing patterns (Fig. 4). As the result, a dramatic decrease of the test length as shown in Table 4 was achieved (see Fig. 6). How quickly the resolution will be improved already by the second failing test pattern for the analyzed circuits in the case of using five SA-s can be seen in Fig. 7. Table 5 shows how the diagnostic resolution (Res) in the case of five SAs can be improved by increasing the number of failed patterns to be found i.e. by increasing the test length (Length). The data in Table 5 and in Fig. 7 are averages over all the faults in the circuits compared. In Table 6 we see the impact of the number of SAs on the diagnostic resolution. Three ISCAS circuits are compared and only the average and worse resolutions are calculated over all possible faulty cases. The best resolution for all circuits was 1. Since the

286

R. Ubar et al. / Microprocessors and Microsystems 32 (2008) 279–287

Technology Development Center and by EC Framework 6 research project VERTIGO FP6-2005-IST-5-033709.

Table 6 Influence of the number of SAs on the resolution #SA

1 2 3 4 5 6 7 8 9 10

c2670

c5315

c7552

Av

Worse

Av

Worse

Av

Worse

151.5 73.6 43.9 34.9 27.3 25.4 24.8 21.5 21.7 21.4

379 190 146 113 87 120 130 108 112 115

232.8 92.0 55.7 41.0 40.7 27.6 23.7 23.8 21.3 19.9

676 342 228 165 220 117 141 156 97 83

262.4 106.8 95.2 65.6 47.9 45.7 41.4 34.6 34.4 33.7

806 364 433 333 217 214 195 158 169 154

influence of the multiple SAs is high especially in the beginning of the procedure, we stopped the procedure at the first failing pattern to determine more exactly the sensitivity of the resolution on the number of SAs. The short test sequence (diagnosis on the basis of a single failed pattern) is the reason why the resolution values (numbers of suspected faults) in Table 6 are rather high. The main message of the Table 5 and Fig 7 is to show how the proposed method allows to trade off between the time cost (test length) and the accuracy (resolution) of the fault diagnosis. For example, in the case of the circuit c2670 and using a single SA the diagnostic resolution is 4.1 at the test length 37.2 (Table 3). In this case all the failing patterns will be found which means that the average diagnostic resolution 4.1 is the best possible, however the test length is rather high. When using 5 SAs (Table 5) even better resolution 3.7 will be achieved at the cost of test length only 9.1 which is four times better than using a single SA. The resolution can be further improved up to 3.3, however at the cost of increasing test length up to 29.5. 8. Conclusions A new method is proposed for embedded fault diagnosis in digital systems with BIST environments. Compared to the classical bisectioning of test pattern sets, in this paper the following novelties were introduced: – instead of test patterns as in the classical Binary Search, the detected fault sets are the objectives of bisectioning which allows to reduce the average length of the diagnostic procedure; – the sequential character of the new method allows to prune the search space and to exclude the need of finding all the failed test patterns as in the case of the classical Binary Search, which additionally increases the speed of diagnosis; – we have developed a method for optimized partitioning of a single signature analyzer into a set of subanalyzers to improve the fault resolution when using the information only from small sets of failed patterns; – a method and algorithms are proposed to design an optimal interface between the circuit under test and the block of signature analyzers to achieve the best diagnostic resolution. The proposed method is compared with three known fault diagnosis methods: classical Binary Search, Doubling and Jumping. Experimental results demonstrate that the new method outperforms considerably these methods. Acknowledgements This research work has been supported by Estonian Science Foundation grants 7068, 7483, by Enterprise Estonia funded ELIKO

References [1] I. Bayraktaroglu, A. Orailoglu, The construction of optimal deterministic partitionings in scan-based BIST fault diagnosis: Mathematical foundations and cost-effective implementations, IEEE Trans. Comput. 54 (1) (2005) 61– 75. [2] M.L. Bushnell, V.D. Agrawal, Essentials of Electronic Testing, Kluwer Academic Publishers., Norwell, MA, 2000. [3] I. Bayraktaroglu, A. Orailoglu, Gate level fault diagnosis in scan-based BIST, DATE (2002) 376–381. [4] A.B. Khang, S. Reda, Combinatorial group testing methods for the BIST diagnosis problem, Design Automation Conf., Proceedings of the ASP-DAC, Asia and South Pacific, vol. 27–30, 2004, pp. 113–116. [5] C. Liu, K. Chakrabarty, M. Goessel, An interval-based diagnosis scheme for identifying failing vectors in a scan-BIST environment, DATE (2002). [6] J. Ghosh-Dastidar, N.A. Touba, A rapid and scalable diagnosis scheme for BIST environments with a large number of scan chains, VTS (2000). [7] J. Richman, K.R. Bowden. The modern fault dictionary, in: Proceedings of IEEE International Test Conference, 1985, pp. 696–702. [8] M. Abramovici, M.A. Breuer, Fault diagnosis in synchronous sequential circuits based on an effect-cause analysis, IEEE Trans. Comput. 31 (12) (1982) 1165– 1172. [9] J.M. Solana, J.A. Michell, S. Bracho, Elimination Algorithm: A Method for Fault Diagnosis in Combinational Circuits Based on an Effect-Cause Analysis, IEE Proc. E (Comput. Digit. Techniq.) 133 (1) (1986) 31–44. [10] H. Cox, J. Rajski, A method of fault analysis for test generation and fault diagnosis, IEEE Trans. Comput. Des. Integr. Circ. Syst. 7 (7) (1988) 813–833. [11] S.J. Sangwine, Fault Diagnosis in Combinational Digital Circuits Using a Backtrack Algorithm to Generate Fault Location Hypotheses, IEE Proc. G (Electronic Circuits and Systems) 135 (6) (1988) 247–252. [12] S. Venkataraman, I. Hartanto, W.K. Fuchs, Dynamic diagnosis of sequential circuits based on stuck-at faults, Proc. IEEE VLSI Test Symp. (1996) 198– 203. [13] V. Boppana, W.K. Fuchs, Fault dictionary compaction by output sequence removal, in: Proceedings of IEEE International Conference on Computer-Aided Design, November 1994, pp. 576–579. [14] B. Chess, T. Larrabee, Creating small fault dictionaries, IEEE Trans. Comput. Des. Integr. Circ. Syst. 18 (3) (1999) 346–356. [15] R.C. Aitken, V.K. Agarwal, A diagnosis method using pseudorandom vectors without intermediate signatures, in: Proceedings of International Conference on Computer-Aided Design (ICCAD 89), IEEE CS Press, Los Alamitos, CA, 1989, pp. 574–580. [16] W.H. McAnney, J. Savir, There is information in faulty signatures, in: Proceedings of International Test Conference (ITC 87), IEEE CS Press, Los Alamitos, CA, 1987, pp. 630–636. [17] C.E. Stroud, T.R. Damarla, Improving the efficiency of error identification via signature analysis, in: Proceedings of 13th IEEE VLSI Test Symposium (VTS 95), IEEE CS Press, Los Alamitos, CA, 1995, pp. 244–249. [18] J. Ghosh-Dastidar, D. Das, N.A. Touba, ‘‘Fault diagnosis in scan-based BIST using both time and space information, in: Proceedings of International Test Conference (ITC 99), IEEE CS Press, Los Alamitos, CA, 1999, pp. 95– 102. [19] J. Rajski, J. Tyszer, Diagnosis of Scan Cells in BIST Environment, IEEE Trans. Computers 48 (7) (1999) 724–731. [20] J. Savir, W.H. McAnney, Identification of failing tests with cycling registers, in: Proceedings of International Test Conference (ITC 88), IEEE CS Press, Los Alamitos, CA, 1988, pp. 322–328. [21] Y. Wu, S.M.I. Adham, Scan-Based BIST Fault Diagnosis, IEEE Trans. ComputerAided Design 18 (2) (1999) 203–211. [22] I. Bayraktaroglu, A. Orailoglu, Cost-Effective Deterministic Partitioning for Rapid Diagnosis in Scan-Based BIST, IEEE D&TC (2002) 42–53. [23] C. Liu, K. Chakrabarty, Failing Vector Identification Based on Overlapping Intervals of Test Vectors in a Scan-BIST Environment, IEEE Trans. CAD IC Syst. 22 (4) (2003) 593–604. [24] S. Pateras, Embedded Diagnosis IP, DATE (2002) 242–244. [25] P. Wohl et al., Effective Diagnostics through Interval Unloads in a BIST Environment, IEEE/ACM DAC (2002) 249–254. [26] T. Clouqueur et al., Efficient Signature-Based Fault Diagnosis Using Variable Size Windows, VLSI Des. Conf. (2001) 387–392. [27] H.-J. Wunderlich, From embedded test to embedded diagnosis, in: IEEE 10th European Test Symposium, Tallinn, 2005, pp. 22–25. [28] F.K. Hwang, A method for detecting all defective members in a population by group testing, J. Amer. Statist. Assoc (1972) 605–608. [29] D.-Z. Du, F.K. Wang, Combinatorial Group Testing and its Applications, World Scientific, 1994. [30] Bar-Noy, F. Hwang, H. Kessler, S. Kutten, A New Competitive Algorithm for Group Testing, Discr. Appl. Math. (1994) 29–38. [31] D.-Z. Du et al., Modifications of Competitive Group Testing, SIAM J. Comput. (1994) 82–96. [32] R. Ubar, S. Kostin, J. Raik, T. Evartson, H. Lensen, Fault diagnosis in integrated circuits with BIST, in: Proceedings of 10th IEEE EUROMICRO Conference on

R. Ubar et al. / Microprocessors and Microsystems 32 (2008) 279–287 Digital System Design – DSD 2007, Lübeck, Germany, vol. 27–31, 2007, pp. 604–610. [33] http://www.pld.ttu.ee/tt.

Raimund Ubar received his Ph.D. degree in 1971 at the Bauman Technical University in Moscow. He is a professor of Computer Engineering at Tallinn University of Technology. His research interests include computer science, electronics design, design verification, test generation, fault simulation, design-for-testability, fault-tolerance. He has published more than 200 papers and two books. R.Ubar has given seminars or lectures in 20–25 universities in more than 10 countries. In 1993– 1996 he was the Chairman of the Estonian Science Foundation and a member of the Estonian Science Council. He is a Golden Core Member of the IEEE, a member of ACM, SIGDA, Gesellschaft der Informatik (Information Society, Germany), European Test Technology Technical Committee and Estonian Academy of Sciences.

Sergei Kostin received his B.S. and M.Sc. degrees in Computer Engineering from Tallinn University of Technology (TUT) in 2006 and in 2007, respectively. Currently he is a Ph.D. student at TUT. His research interests include diagnosis and testing of digital circuits, and Boundary Scan technique.

287

Jaan Raik received his M.Sc. and Ph.D. degrees in Computer Engineering from Tallinn University of Technology in 1997 and in 2001, respectively, where he currently holds the position of a postdoc researcher. He is a member of IEEE Computer Society, a member of program committees for several top-level conferences and has co-authored more than 100 scientific publications. In 2004, he was awarded the national Young Scientist Award. His main research interests include high-level test generation and verification.