A new scheme of test data compression based on equal-run-length coding (ERLC)

A new scheme of test data compression based on equal-run-length coding (ERLC)

INTEGRATION, the VLSI journal 45 (2012) 91–98 Contents lists available at ScienceDirect INTEGRATION, the VLSI journal journal homepage: www.elsevier...

341KB Sizes 1 Downloads 33 Views

INTEGRATION, the VLSI journal 45 (2012) 91–98

Contents lists available at ScienceDirect

INTEGRATION, the VLSI journal journal homepage: www.elsevier.com/locate/vlsi

A new scheme of test data compression based on equal-run-length coding (ERLC) Wenfa Zhan a,n, Aiman El-Maleh b a b

VLSI Institution, Department of Educational Technology, Anqing Normal College, Anhui Province, PR China Department of Computer Engineering, King Fahd University of Petroleum & Minerals, Saudi Arabia

a r t i c l e i n f o

abstract

Article history: Received 1 July 2010 Received in revised form 7 February 2011 Accepted 16 May 2011 Available online 26 May 2011

A new scheme of test data compression based on run-length, namely equal-run-length coding (ERLC) is presented. It is based on both types of runs of 0’s and 1’s and explores the relationship between two consecutive runs. It uses a shorter codeword to represent the whole second run of two equal length consecutive runs. A scheme for filling the don’t-care bits is proposed to maximize the number of consecutive equal-length runs. Compared with other already known schemes, the proposed scheme achieves higher compression ratio with low area overhead. The merits of the proposed algorithm are experimentally verified on the larger examples of the ISCAS89 benchmark circuits. & 2011 Elsevier B.V. All rights reserved.

Keywords: Test compression VLSI test Coding Built-in Self-Test

1. Introduction The complex functionality and size caused by increasing integration levels of VLSI chips makes testing for these chips more and more difficult. Testing remains a dominant cost factor in VLSI design and the testing cost reduction is one of the objectives which VLSI designers are devoted to achieve vigorously. The most two important sources of the test cost are test data volume and test power. Researches about reducing the test data volume fall into three aspects: Built-in Self-Test (BIST) [1], test set compaction [2], and test data compression [3–12]. Using BIST circuitry can reduce the test data volume. The BIST circuitry generates all stimulus data, applies the data to the circuit, evaluates all response data, and determines whether or not a faulty response was produced. The premise of BIST is that no ATE is required. However, the area overhead of BIST can be significant (e.g., 3.4%) [3]. In addition, BIST design may depend on the test pattern set. Therefore, last minute design changes (resulting in test pattern modifications) may require redesigning the BIST hardware. In fact, many digital circuits contain randompattern-resistant (rpr) faults that limit the fault coverage of pseudorandom patterns [4]. Several techniques have been proposed to enhance the fault coverage achieved with BIST. These

n

Corresponding author. Tel.: þ86 556 8778916. E-mail addresses: [email protected] (W. Zhan), [email protected] (A. El-Maleh). 0167-9260/$ - see front matter & 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.vlsi.2011.05.001

techniques are as follows: (1) modifying the circuit under test by test point insertion or by redesigning the circuit under test [4,5], (2) weighted pseudorandom patterns [6,7], and (3) mixed-mode testing [8–10]. However with the increasing size of circuit under test, the hardware overhead cannot be accepted in some cases. The goal of test compaction is to reduce (or compact) the number of test vectors into a smaller number that achieves the same fault coverage. Test compaction can be classified into two classes: post-generation compaction and compaction during test generation. In the first class, the number of test vectors is reduced after they are generated, such as, reverse order fault (ROF) simulation [13], forced pair-merging (FPM) [14], redundant vector elimination (RVE) [15], and essential fault reduction (EFR) [15], whereas in the second class, the number of test vectors is minimized during the automatic test pattern generation (ATPG) process, such as dynamic compaction [16], and COMPACTEST [17]. The technology of test set compaction does not introduce additional hardware overhead, but it may affect the fault coverage of faults which have not been modeled. Test data compression can reduce test data effectively and increase the transferring speed from ATE to chip, which makes full use of limited bandwidth and memory of ATE. Moreover, the hardware overhead is very low and acceptable. Thus, this technology is widely used. A popular class of compression schemes relies on the use of a linear decompressor. These techniques are based on LFSR reseeding [8,9,18,19] and combinational linear expansion networks consisting of XOR gates [20–22], and they have been implemented in commercial tools such as TestKompress from Mentor

92

W. Zhan, A. El-Maleh / INTEGRATION, the VLSI journal 45 (2012) 91–98

Graphics [23], SmartBIST from IBM/Cadence [24], and DBIST from Synopsys [25]. These compression schemes exploit the fact that scan test vectors typically contain a large fraction of unspecified bits even after compaction. However, the compression effect of these schemes is limited by the Smax, where Smax is the maximum number of specified bits in the test vector. That is to say, the compression effect of these schemes is related to the worst case in the test set. Moreover, such techniques are often based on the use of ATPG and fault simulation which will not be applicable for compressing test sets of IP cores where the structure of the IP is not available. Another category of compression methods is based on coding. These methods exploit the regularity inherent in test data to achieve high compression. They involve partitioning the original data into symbols, and then replacing each symbol with a codeword to form the compressed data. According to the volume of the symbol in the original test data and codeword in compressed data, the coding schemes can be divided into five types [26,27]: fixed-to-fixed-length, fixed-to-variable-length, variable-to-fixedlength, variable-to-variable-length, and mixed coding based on fixed and variable length. Fixed-to-fixed-length coding scheme uses fixed-length codeword to code the fixed-length symbol in the original test data, such as dictionary-based coding [28] and LFSR-based reseeding coding [29]. Fixed-to-variable-length coding scheme uses variable-length codeword to code the fixedlength symbol in the original test data, such as Huffman-based coding [30]. Variable-to-fixed-length coding scheme uses fixedlength codeword to code the variable-length symbol in the original test data, such as traditional run-length-based coding [31]. Variable-to-variable coding scheme uses variable-length codeword to code the variable-length symbol in the original test data, such as VIHC coding [32], Golomb coding [33], frequencydirected run-length (FDR) coding [34], alternating run-length coding [35,36], extended frequency-directed run-length (EFDR) coding [37], v9C coding [38], CEBM[11] and BMUC [39]. In [27], a mixed coding based on fixed and variable run-length, namely VTFPVL, is presented. It uses fixed-length head section, which simplifies the process of coding and eases the process of decoding, and variable-length tail section, which can add the feasibility of coding and make full use of every codeword. The advantage of this category of compression methods is that they can efficiently exploit correlations in the specified bits and are usable on any set of test cubes (they do not require ATPG constraints). Hence, they are effective for IP cores for which no structural information is available. However, they are not as effective in exploiting don’t-care bits as linear techniques and they require more complex control logic [26]. Test power should also be considered during test. During the test process, the shift operations to load and observe test data can lead to excessive transitions in the scan chain and combinational logic. In addition, unnecessary switching activity in the circuit under test [40] and the reduced correlation between consecutive test pattern and test response can aggravate the power consumption during the capture cycle. The increased test power dissipation may reduce the reliability of circuits and can increase the package cost because the heat dissipation and current density due to high switching activity in the circuit can exceed the limit of design specification. Therefore, many techniques to reduce the test power have been proposed. In [41–43], low power scan test techniques to reduce the transitions in scan-in vectors are studied. However, in these techniques, the transitions in scanout vectors cannot be controlled and the peak power consumption during the capture cycle is not guaranteed. In [44,45], the transitions in the scan chain have been prevented from propagating to circuit lines by adding externally controlled gates between the outputs of scan cells and the inputs of circuit under test. Although this technology can disable the switching activity inside

the circuit under test completely during the scan shift operation, it may introduce undesirable timing impact on critical paths, and the power consumption in the capture cycle is not considered. In [46–48], a scan chain is partitioned into multiple segments, and only one segment is activated at a time. In [48], the original scan chain partition scheme has been presented and it has been shown theoretically that the reduction of test power dissipation is linearly proportional to the number of segments without increasing the test application time. In [46], selective activation scheme is applied not only to the shift operation but also to the capture operation in order to reduce the peak power consumption by using different clock phases for segments in capture cycle. In this scheme, several clock cycles are required for exclusively activating segments in the capture cycle that results in higher testing application time. In [47], the test response data captured in some scan segments are used for the generation of the subsequent test stimulus by exploiting don’t-care bits in order to reduce both test time and test data volume. In [49], the test set is generated and ordered in such a way that some of the scan chains can be frozen for portions of the test set. Conventional test data compression schemes generally increase test power. A large number of test pattern bits (don’tcare bits) being assigned randomly result in a large number of transitions in the scan chains thereby increasing power dissipation during test [8,9,18,19]. The idea of considering together the problems of test data compression and low power test has been previously investigated in few papers. In [50], test power is reduced because the output of the two LFSRs is ANDed or ORed, thus reducing the transition probability. However, it requires storing an extra set of seeds for the extra ‘‘masking’’ LFSR. In [41], an encoding algorithm that reduces both test storage and test power was presented. The test cubes are encoded using a Golomb code which is a run-length code. All don’t-care bits are mapped to 0 and the Golomb code is used to encode runs of 0’s. The Golomb code efficiently compressed the test data, and the mapping of the don’t-care to all 0’s reduces the number of transitions during scan-in and thus test power. One drawback of a Golomb code is that it is very inefficient for runs of 1’s. It needs to be pointed out that FDR code has the same merits and drawbacks as Golomb code. By considering both types of runs, the total number of runs and transitions during scan-in will both decrease, which could result in higher test data compression and lower test power, as in alternating run-length code and EFDR code [35,37]. All of these schemes can compress the test data effectively more or less. However, the relationship between consecutive runs is not explored. Based on the relationship between consecutive runs, a new scheme, namely equal-run-length coding (ERLC), is proposed. This scheme first considers both types of runs of 0’s and 1’s, so the total number of runs and transitions during scan-in will both decrease, which result in better compression effect and lower test power. Then, it further explores the relationship between two consecutive runs on the basis of traditional coding characteristic which uses shorter codeword to represent longer symbol. This presented scheme uses a shorter codeword to represent the whole second run of two consecutive runs of equal lengths, which further improves the compression effect. Experimental results show that the compression effect of this scheme is higher than that of previously proposed schemes. The main contribution of proposed scheme is that the relationship between two consecutive runs of equal length is explored using a shorter codeword to represent the whole second run, which result in a better compression performance than most known schemes for most benchmark circuits. A scheme is proposed for filling the don’t-care bits to maximize the number of consecutive equal-length-run and improve test compression.

W. Zhan, A. El-Maleh / INTEGRATION, the VLSI journal 45 (2012) 91–98

In addition, the proposed scheme is test data independent and the decompression circuitry is simple and has low area overhead. The rest of this paper is organized as follows. Section 2 explains the base of the algorithm of this scheme. The algorithm of don’t-care bits filling is introduced in Section 3. Decompression architecture of the proposed method is presented in Section 4. Section 5 reports the experimental results, and Section 6 concludes the paper.

2. Algorithm of ERLC The ERLC technique is based on encoding both types of runs which results in a reduction in the total number of runs and decreases the volume of test data stored. The technique is variableto-variable-length coding technique as shown in Table 1. The 1st column of Table 1 is run-length. The 2nd column is the group number. The 3rd and 4th columns are the prefix and tail of every codeword. The last two columns are the codewords of runs of 0’s and 1’s. The size of each group increases with the increase in group index, which is in accordance with the distribution of run-length in actual test data, and this characteristic is similar to FDR [34]. The ERLC code is very similar to the EFDR code [37] except that the ERLC code for run-length i is the EFDR code for run-length iþ1. Thus, it is expected that the use of ERLC code should produce a comparable compression performance in comparison to the use of EFDR code. The novel characteristic of this approach is that ERLC scheme explores the relationship between two consecutive runs. If the length of a consecutive run is the same as that of the former, then the whole second run can be represented by a shorter codeword. A 3-bit codeword (000 or 100) is used to represent a repeated run, the length of which is the same. Codeword 001 represents that the encoded run is runs of 0’s and 101 represents that the encoded run is runs of 1’s. This further improves the compression ratio. From the above analysis, it can be concluded that the scheme of ERLC uses two different coding ways: (1) Type I: This coding way uses ERLC code to code the run (symbol) in the original test data, similar to that of the traditional run-length coding.

Table 1 ERLC code. Run-length

Group

Prefix

Tail

Codeword runs of 0’s

Codeword runs of 1’s

1 2 3 4 5 6 7 – 12 13 –

A1 A2

0 10

A3

110





1 00 01 10 11 000 001 – 110 111 –

001 01000 01001 01010 01011 0110000 0110001 – 0110110 0110111 –

101 11000 11001 11010 11011 1110000 1110001 – 1110110 1110111 –

93

(2) Type II: This coding way uses a shorter codeword (000 or 100) to represent the second run of two equal length consecutive runs. That is to say, if the lengths of two consecutive runs are the same, then the latter run is represented by the codeword 000 or codeword 100, or else the latter run should be coded by using ERLC coding table. An example of coding using ERLC scheme is shown in Fig. 1. Let us take the test vector slice of 000000XX XXXXX110 into account. If the don’t-care bits are filled selectively, we can get the slice 00000001 11111110. The lengths of these two runs are equal to 7. So the latter run can be encoded by 100. The encoded result of the slice is 0110001 100. The encoded length is shorter by 6 bits than that of the original slice. The length of the original test vector slice in Fig. 1 is 64 bits. The length of the compressed test data is 48, 38, and 26 bits by using the technology of FDR, EFDR and ERLC, respectively. As illustrated by the example, the proposed scheme can obtain higher compression than FDR and EFDR compression techniques. Statistics about the total number of runs and the number of consecutive runs of equal length using Mintest test sets of ISCAS89 benchmark circuits is shown in Table 2. The 1st column is the circuit name. The 2nd column is the total number of runs and the 3rd column is the number of consecutive runs of equal length. The last column is the percentage of the number of consecutive runs of equal length out of total number of runs. On average, the percentage of consecutive runs of equal length is 19.15% out of the total number of runs. Therefore the percentage of consecutive runs of equal length is significant.

3. Algorithm of don’t-care bits filling From the analyses above, it is apparent that filling the don’tcare bits appropriately plays a crucial role in increasing the compression achieved. Original test data typically have only 1–10% of the bits specified, while the rest are don’t-care bits [3]. Filling those don’t-care bits appropriately can increase the occurrence of consecutive equal-length-run and enhance the compression effect achieved. If the two consecutive runs are of equallength, then the latter can be represented by one of the two shorter codewords. Thus, the whole latter run can be encoded by 3 bits, which further improves the compression ratio.

Table 2 Statistics about the total number of runs and the number of equal length runs. Circuit

Total number of runs

Number of consecutive runs of equal length

Percentage (%)

s5378 s9234 s13207 s15850 s38417 s38584

2220 3599 3916 3910 10,622 11,670

646 597 464 623 2526 2058

29.10 16.59 11.85 15.93 23.78 17.64

Avg.

Original test vector slice: 000000XX XXXXX110 000000000001 000000000001 11111111111X 000000000001 FDR: 110111 00 110110 110101 00 00 00 00 00 00 00 00 00 00 00 110110 EFDR: 0110110 100 0110100 0110100 1110101 0110011 ERLC: 0110111 101 0110101 000 100 000 loriginal= 64 bits; lFDR= 48 bits; lEFDR= 38 bits; lERLC= 26 bits Fig. 1. An example for ERLC encoding.

19.15

94

W. Zhan, A. El-Maleh / INTEGRATION, the VLSI journal 45 (2012) 91–98

To illustrate the impact of filling on the compression achieved, let us consider the test vector slice of 000000X000000X000000X XXX1X111XXX1 11X XX1110 (an X represents a don’t-care bit) into account. There are 13 don’t-care bits and 213 possible ways of filling the don’t-care bits. One of the results after filling is 000 000000000000000001111111111111111111110, which are two consecutive runs of length 20. The first run is encoded by 011 100110, and the second run is encoded by 100. So the whole compressed data is 011100110100, whose length is 12. If the don’t-care bits are filled like Golomb or FDR, that is, all of the don’t-care bits are filled with 0 to get 00000000000000000000 00001 01 110 001 110 001 110, the following encoded data is obtained 011101010 001 11000 000 100 000 100, whose length is 29. If the don’t-care bits are filled like EFDR or alternating run-length code, the test slice after filling is 0000000000000000000000001 11111111 111111110. And the result of the compressed data is 011101010 111100010, whose length is also 18. From this example, it can be concluded that appropriately filling the don’t-care bits can decrease the volume of test data. The filling principle of don’t-care bits of the proposed scheme is to maximize the occurrence of equal length consecutive runs. The process of filling don’t-care bits is shown in Fig. 2, which mainly contains three parts: test slice selection, equal-length-run judgment, and Xs filling. First, a test slice is selected that has the potential of containing equal-length runs. This can be obtained according to the twice reversal of the value of the specified bit in the test slice from the left, which is shown in step 1 in Fig. 2. Then a check is made whether this slice or part of it, obtained by reducing the last bit and some of its preceding don’t-care bits, can be divided into two runs whose length are equal, which is shown in steps 2.1 and 2.2 in Fig. 2. If the whole or part of the slice can be divided into two equal-length-run, then these don’t-care bits in this slice can be filled according to the two runs. Otherwise, the test slice is adjusted by once reversal of the value of the specified bit in the test slice, which is shown in step 2.2.4 in Fig. 2. In the same way, if this slice can be divided into two runs whose length are the same, then these don’t-care bits in this slice can be filled according to the two equal-length-run. Otherwise the don’t-care bits are filled in a similar way as reported in [37]. That is to say, the first known bit encountered in a run determines how all the Xs encountered in the run will be filled. If that bit is 0, then all the Xs in the run will be filled by 0s, otherwise they are filled by 1s. For simplicity, an example is used to explain the process of filling Xs. Without loss of generality, let us consider the test data of X00000X000000X000000XXXX1X111XXX111X XX1110 0000 00X000000X000000XXXX1X111XXX111XXX111XXX 00000000 0X00XXXX XX X11XXX0. First, the test slice needs to be selected. The first bit of this slice is the first position of test data, whose value is X. Let us denote this position p0. Then, the first specified bit from the left of the test data is the second bit of the original

test data, whose value is 0. Let us denote this position p1. Then, the first specified bit whose value is 1 from p1 is found in the 25th bit in the original test data. Let us denote this position p2. Then, the first specified bit whose value is 0 from p2 is found in the 42nd bit in the original test data. Let us denote this position p3. Thus, the test slice (ts1) X00000X000000X00000 0X XXX1X111XXX 111XXX1110 is selected from p0 to p3. This test slice can be divided into two runs of the same length. Therefore, Xs in ts1 should be filled according to equal-length-run. This is to say, ts1 should be filled into 000000000000000000001 11111111111111 1111110. This process is shown in Fig. 3(a). In the same way, test slice (ts2) 000000X000000X000000 XXXX1X111XXX111XXX111X XX0 is selected. However, ts2 cannot be divided into two runs of the same length. So, the test slice should be adjusted. If the test slice is adjusted by decrementing p3, it can be divided into two equal-length-run. So ts2 should be 000000X000 000X000000XXXX1X111XXX111XXX111X XX and ts2 should be filled into 00000000000000 000000011111111 111111111111110. This process is shown in Fig. 3(b). In the same way, test slice (ts3) 000000000X00 XXXXXXX11XXX0 is selected. However, ts3 cannot be divided into two equal-length-run. Furthermore, adjusting the length of ts3 by reducing the last bit and one or some of its former Xs does not divide ts3 into two equal-length-run. So ts3 is adjusted to 000000000X00 XXXXXXX1. It can be divided into two equal-length-run and ts3 should be filled into 00000000010000000001. This process is shown in Fig. 3(c). The remaining filling of Xs in the test data is straightforward and is shown in Fig. 3(d).

4. Design of ERLC decoder In this section, the design of the decoder of the ERLC coding scheme is illustrated. The run-length (i) of ERLC code is the sum of prefix (a) and tail (b), namely i¼a þb, where a and b are of length k each. For example, (13)10 ¼(110)2 þ(111)2. Further analyzing the prefix, we find that it will appear in the form of 111y1110. By transforming the equation i¼a þb, we can get 00

0

1 0

1 1

1

i ¼ a þ b ¼ @@a þ @ 10 A A þ bA@ 10 A |{z} |{z} 000

1

2

2

0

1 1

0

2

2

1 1

¼ @@@11. . .110 A þ @ 10 A A þ @ b A A2 |{z} |fflfflfflfflfflffl{zfflfflfflfflfflffl} |{z} 00

k

1

2

2

0

2

1 1

0

k

2

1

¼ @@100. . .00 A þ @ b A A2 ¼ @ 1b A 2 |fflfflfflfflfflffl{zfflfflfflfflfflffl} |{z} |{z} kþ1

2

k

2

kþ1

2

1. Test Slice Selection. First, denote position of start as p Search the first specified bit, t, from the start of the test data left. This position is denoted p . Then, search for the first specified bit whose value is not t from position p . This position is denoted p . Then, search for the first specified bit whose value is t from position p . This position is denoted p . The test slice from p to p is selected as candidate. 2. Equal-Length-Run Judgment 2.1 Checking. If this slice can be split into two equal-length-run, goto step 3. Otherwise, goto 2.2. 2.2 Adjustment. 2.2.1 If the last bit is the same as the first specified bit of next slice, then delete the last bit from the test slice and goto step 2.2.2. Otherwise, goto step 2.2.4. 2.2.2 If this slice can be split into two equal-length-run, goto step 3, else if there are any Xs form the end of the test slice, goto step 2.2.3. Otherwise, goto Step 3. 2.2.3 Adjust the test slice by deleting last bit from the test slice, and goto setp 2.2.2. 2.2.4 Adjust the selectedslice to the slice from position p to p , and goto step 3. 3. Xs Filling. If this slice can be split into two equal-length-run, the Xs in this slice are filled into two equal-length-run. Otherwise, the Xs in this slice are filled by the first encountered specified bit. 4. If there are still Xs in the test data left, goto step 1. Otherwise, end. Fig. 2. Process of filling Xs.

W. Zhan, A. El-Maleh / INTEGRATION, the VLSI journal 45 (2012) 91–98

95

ts1 test slice selection

X00000X000000X000000XXXX1X111XXX111XXX111000000 p3

p2

p0 p1

filling Xs

00000000000000000000111111111111111111111000000 ts2

ts1

test slice selection

000000X000000X000000XX XX1X111XXX111XXX111XXX00 p3

p2 ts2

p0 p1 ts1

test slice adjustment

000000X000000X000000XX XX1X111XXX111XXX111XXX00 p3

p2 p0 p1 0000000000000000000001 111111111111111111111000

filling Xs

ts3

test slice selection

000000000X 00 XXXXXXX11XXX0 p0 p1

p2

ts3

p3 test slice adjustment

000000000X 00 XXXXXXX11XXX0 p2

p0 p1

p3 filling Xs

0000000001 00 000000011XXX0 ts4

test slice selection

1XXX0 11110

filling Xs

Fig. 3. An example of process of filling Xs.

That is to say, if we increase 1 bit to the tail of ERLC codeword and put the value of 1 in this bit, then we get a new code which represents the run-length that is longer by 2 than that of actual run. So we can increase 1 bit to the tail of ERLC codeword and put the value of 1 in this bit, then we get a new code which represents the run-length that is longer by 2 than that of actual run. That is to say, the length information of ERLC is hidden in the tail of codeword. So we can use this discipline in decompression. This discipline can be implemented by a special counter. The special counter has two characteristics: (1) setting the lowest bit of the counter to number 1. When the data needing to be decoded is shifted into the counter, the number 1 and other data are together shifted to high bits of the counter. (2) using non-0 low bound. Traditional counter counts down to 0, but this counter counts down to 2 (the content of counter is ‘‘10’’). This function can be implemented easily by some combinational circuits, which will not introduce high hardware overhead. The design of this decoder is similar to that of the FDR or EFDR using the FSM-based design. This decoder is embedded in chip, which has some characteristic of small size and simple structure. This decoder does not introduce hardware overhead obviously, and is independent of not only the DUT, but the precomputed test data as well. The block diagram of the decoder is shown in Fig. 4. The decoder decodes the precomputed test data TE to output the original test data TD. The structure of this decoder is simple, which only needs a simple FSM, a special kþ1 bit counter, a log2 k bit counter and a kþ1 bit register. The signal bit_in is a primary input through which the coded test data is sent to the FSM when the signal en is high, and the signal out outputs decoded data when the signal v is high. The counter_in is a path which sends the tail section of codeword to

S R

RS Latch

Q

v S R bit_in en

out

XOR

FSM

out counter_in shift dec1 load rst1

k+1-bit counter

dcnt

clk reg_ent1

dload

k+1-bit register dec inc clk

clk

clk

rst2

log2k-bit counter Fig. 4. Block diagram of ERLC decoder.

the kþ 1-bit counter, and the signal shift is a signal that takes control of the counter_in. The dec1 is used to notify kþ 1-bit counter to start counting down, and the signal rst1 is high as the counting finishes. The signal dec2 indicates to the log2 k-bit counter to start counting down, on the other hand, signal inc notifies it to start counting up. The signal rst2 indicates that the log2 k-bit counter finished counting. The signal load tells when the content of kþ1 bits register should replace the content of kþ1 bits counter. The block diagram of the proposed ERLC decoder is given in Fig. 4.

96

W. Zhan, A. El-Maleh / INTEGRATION, the VLSI journal 45 (2012) 91–98

proposed scheme are shown in the 3rd and 4th columns, respectively. The other columns show the compression ratio of Golomb, FDR, VIHC, and EFDR coding techniques, respectively. The compression ratio is determined by

Similar to other compression techniques based on coding, the scheme of ERLC decompressor requires synchronization with the tester. Techniques for handling synchronization with the tester can be employed as has been proposed in [51]. The decompressor of multi-scan chains architecture can be extended from this single-scan chain one by using a shift register with width equal to the number of scan chains, which is shown in Fig. 5. In this case, however, the test-application time advantage that multiple scan chains offer is not embodied. But we can explore the tradeoff between test application time and hardware overhead if we partition the scan chains into groups, then drive every group by a decompressor. In that case, all decompressors work in parrallel and the test-application time can be reduced. But due to the use of multiple decompressors, the hardware overhead will increase.



where TD is original length of test data and TE is the length of compressed data. The last row of Table 3 is the average compression ratio. The compression ratio is determined by a ¼ ððTD0 TE0 Þ=TD0 Þ  100%, where TD0 is the whole volume of original test data in the six ISCAS89 benchmark circuits and TE0 is the whole volume of compressed test data in the six ISCAS89 benchmark circuits. In Table 3 the results are shown based on the original test data. The filling of don’t-care bits in the proposed scheme is done to maximize the number of consecutive runs of equal length. Filling of don’t-care bits for other techniques is done as proposed by those techniques to achieve best compression. For Golomb and FDR techniques, the don’t-care bits are filled by 0’s while for EFDR, they are filled to minimize the number of runs. From Table 3, as can be seen, the proposed scheme achieves higher compression than that of the Golomb, and FDR technique for all the test sets. The proposed scheme achieves higher compression than that of VIHC, EFDR coding in four and five out of the six test sets, respectively. On average, the percentage compressions of the proposed scheme are 14.86%, 7.06%, 6.17%, and 0.37% higher than those of Golomb, FDR, VIHC, and EFDR, respectively. Next we present the experimental results on the peak and average power consumption during the scan-in operation. These results show that test data compression can also lead to significant savings in power consumption. We estimate power using the weighted transitions metric [35]. Table 4 compares the average and peak power consumption for Mintest test sets with, FDR, EFDR, and our proposed ERLC scheme. The 1st column of Table 4 is the circuit name. The 2nd and 3rd columns are the peak power and average power if original

5. Experimental results In this section, the effectiveness of the proposed ERLC scheme is demonstrated based on experiments on the largest ISCAS89 benchmark circuits [52]. For comparison with other schemes, we adopted the MinTest test sets, which are the same as [27, 32–35,37,38]. The compression ratio of this scheme compared with other schemes is shown in Table 3. The 1st column of Table 3 is the circuit name. The 2nd column is the test size of original test data. The compressed test data size and compression ratio of the 1 2

sys

Decompressor

data

ðTD TE Þ  100% TD

k-1 k

shift register

Fig. 5. Multi-scan chains architecture.

Table 3 Compression obtained using different coding techniques on original test data. Circuit

Size of TD bit

s5378 s9234 s13207 s15850 s38417 s38584

23,754 39,273 165,200 76,986 164,736 199,104

Scheme presented

Golomb

FDR a (%)

VIHC a (%)

EFDR a (%)

Size/bit

a (%)

m

a (%)

12,389 22,210 32,044 25,844 67,990 76,473

51.32 49.28 83.05 69.04 62.44 64.72

4 4 16 4 4 4

40.70 43.34 74.78 47.11 44.12 47.71

48.02 43.59 81.30 66.22 43.26 60.91

51.78 47.25 83.51 66.23 43.26 60.92

53.67 48.66 82.49 68.66 62.02 64.28

52.94

60.74

61.63

67.43

Avg.

67.80

Table 4 Experimental results on peak and average scan-in power consumption. Circuit

Mintest

FDR

EFDR

The minimum

ERLC

Peak power

Average power

Peak power

Average power

Peak power

Average power

Peak power

Average power

Peak power

Average power

s9234 s13207 s15850 s35932 s38417 s38584

17,494 135,607 100,228 707,280 683,765 572,618

14,630 122,031 90,899 583,639 601,840 535,875

12,994 101,127 81,832 172,834 505,295 531,321

5692 12,416 20,742 73,080 172,665 136,634

12,062 97,613 63,494 125,512 404,654 479,547

3469 8016 13,394 465,32 117,834 89,138

12,060 97,606 63,478 125,490 404,617 479,530

3466 7703 13,381 46,032 112,198 88,298

12,069 97,614 63,511 125,522 404,693 479,573

3500 8115 13,450 47,015 120,775 89,356

Avg.

369,499

324,819

234,233

70,205

197,151

45,823

197,130

45,180

197,163

46,018

W. Zhan, A. El-Maleh / INTEGRATION, the VLSI journal 45 (2012) 91–98

Table 5 Hardware overhead comparision. Scheme

Area overhead (map to 2-input nand gate)

Golomb FDR Proposed

307 320 364

Mintest test sets are used. The 4th and 5th columns are the peak power and average power if FDR coding is used. The 6th and 7th columns are the peak power and average power if EFDR coding is used. The 8th and 9th columns are the minimal peak power and average power in theory by filling the don’t-care bits to appropriate specific value [35]. The last two columns are peak power and average power if our proposed scheme is used. Table 4 shows that the peak power and average power are significantly less if the proposed ERLC scheme is used for test data compression and the decompressed patterns are applied during testing. On average, the peak power (average power) is 15.83% (37.30%) less in our proposed scheme than that of the FDR coding and 46.64% (86.45%) less than that of the Mintest test sets. The peak power and average power is comparable to EFDR and is very close to the best peak power and average power. The hardware overhead of the decoding circuitry is estimated assuming a maximal run-length of 10,000. The design of decoder is synthesized with Synopsys Design Compiler and mapped to the 2-input nand cell of the TSMC35 library. Experimental results are shown in Table 5. The hardware overhead is considered low and comparable to other coding techniques. Thus, based on experimental results it can be concluded that our proposed scheme can achieve better compression results and low power consumption during test with acceptable low hardware overhead.

6. Conclusion Test data compression is an effective solution to reduce the increasing volume of test data. A new test data compression scheme (ERLC) is proposed. This scheme first considers both types of runs of 0’s and 1’s, then further explores the relationship between two consecutive runs on the basis of traditional coding characteristic which uses shorter codeword to represent longer symbol in the original test data. The proposed scheme uses a short codeword to represent the whole second run of two consecutive runs, the lengths of which are the same. It is shown that the presented scheme decreases the ATE memory and channel capacity requirements by obtaining good compression ratios. In addition, this scheme can reduce power dissipation during test due to reducing the number of encoded runs. Moreover, the proposed scheme has a decoder of low complexity. Thus, it is an effective solution for test data compression/decompression of IC design.

Acknowledgment This work is supported by Anhui Provincial Natural Science Foundation (no. 10040606Q42) and Natural Science Foundation of Province College of Anhui under (no. KJ2011A198). Dr. El-Maleh acknowledges support from King Fahd University of Petroleum and minerals.

97

References [1] X. Chen, M.S. Hsiao, Testing embedded sequential cores in parallel using spectrum-based BIST, IEEE Transactions on Computers 55 (2) (2006) 150–162. [2] S.R. Das, C.V. Ramamoorthy, M.H. Assaf, et al., Revisiting response compaction in space for full-scan circuits with nonexhaustive test sets using concept of sequence characterization, IEEE Transactions on Instrumentation and Measurement 54 (5) (2005) 1662–1677. [3] Yinhe Han, Yu Hu, Xiaowei Li, et al., Embedded test decompressor to reduce the required channels and vector memory of tester for complex processor circuit, IEEE Transactions on Very Large Scale Integration Systems 15 (5) (2007) 531–540. [4] D. Xiang, Y. Zhao, K. Chakrabarty, H. Fujiwara, A reconfigurable scan architecture with weighted scan-enable signals for deterministic bist, IEEE Transactions on Computer-Aided Design 27 (6) (2008) 999–1012. [5] N.A. Touba, E.J. McCluskey, Test point insertion based on path tracing, in: Proceedings of the VLSI Test Symposium, 1996, pp. 2–8. [6] H.J. Wunderlich, Multiple distributions for biased random test patterns, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 9 (6) (1990) 584–593. [7] E.B. Eichelberger, E. Lindbloom, F. Motica, J. Waicukauski, Weighted random pattern testing apparatus and method, US Patent 4801870, January 1989. [8] B. Koenemann, LFSR-coded test patterns for scan designs, in: Proceedings of the European Test Conference, 1991, pp. 237–242. [9] S. Hellebrand, J. Rajski, S. Tarnick, S. Venkataraman, B. Courtois, Built-in test for circuits with scan based on reseeding of multiple-polynomial linear feedback shift registers, IEEE Transactions on Computers 44 (2) (1995) 223–233. [10] N.A. Touba, E.J. McCluskey, Altering bit sequence to contain predetermined patterns, US Patent 6061818, May 2000. [11] Wenfa Zhan, Huaguo Liang, Cuiyun Jiang, Zhengfeng Huang, Aiman El-Maleh, A scheme of test data compression based on coding of even bits marking and selective output inversion, Computers and Electrical Engineering 36 (5) (2010) 969–977. [12] Maoxiang Yi, Huaguo Liang, Lei Zhang, Wenfa Zhan, A novel x-ploiting strategy for improving performance of test data compression, IEEE Transactions on Very Large Scale Integration (VLSI) Systems 18 (2) (2010) 324–329. [13] M. Schulz, E. Trischhler, T. Sarfert, SOCRATES: a highly efficient automatic test pattern generation system, IEEE Transactions on Computer-Aided Design 7 (1) (1988) 126–137. [14] J. Chang, C. Lin, Test set compaction for combinational circuits, IEEE Transactions on Computer-Aided Design 14 (11) (1995) 1370–1378. [15] I. Hamzaoglu, J. Patel, Test set compaction algorithms for combinational circuits, in: International Conference on Computer-Aided Design, 1998, pp. 283–289. [16] P. Goel, B.C. Rosales, Test generation and dynamic compaction of tests, in: Digest of Papers 1979 Test Conference, 1979, pp. 189–192. [17] I. Pomeranz, L. Reddy, S. Reddy, COMPACTEST: a method to generate compact test sets for combinational circuits, in: Proceedings of the International Test Conference, 1991, pp.194–203. [18] C. Krishna, N.A. Touba, Reducing test data volume using LFSR reseeding with seed compression, IEEE transactions on computers, in: Proceedings of the International Test Conference, 2002, pp. 321–330. [19] A.A. Al-Yamani, E.J. McCluskey, Built-in reseeding for serial BIST, in: Proceedings of the IEEE VLSI Test Symposium, 2003, pp. 63–68. [20] I. Bayraktaroglu, A. Orailoglu, Test volume and application time reduction through scan chain concealment, in: Proceedings of the Design Automation Conference, 2001, pp. 151–155. [21] S. Samaranayake, E. Gizdarski, N. Sitchinava, F. Neuveux, R. Kapur, T.W. Williams, A reconfigurable shared scan-in architecture, in: Proceedings of the IEEE VLSI Test Symposium, 2003, pp. 9–14. [22] I. Bayraktaroglu, A. Orailoglu, Decompression hardware determination for test volume and time reduction through unified test pattern compaction and compression, in: Proceedings of the IEEE VLSI Test Symposium, 2003, pp. 113–118. [23] J. Rajski, J. Tyszer, M. Kassab, N. Mukherjee, Embedded deterministic test, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 23 (5) (2004) 776–792. [24] B. Koenemann, C. Banhart, B. Keller, T. Snethen, P. Farnsworth, D. Wheater, A SmartBIST variant with guaranteed encoding, in: Proceedings of the Asian Test Symposium, 2001, pp. 325–330. [25] M. Chandramouli, How to implement deterministic logic built-in self-test (BIST), Compiler: a monthly magazine for technologies worldwide, Synopsys, January 2003. [26] N.A. Touba, Survey of test vector compression techniques, IEEE Design and Test of Computers 23 (4) (2006) 294–303. [27] Wenfa Zhan, Huaguo Liang, Feng Shi, et al., Test data compression scheme based on variable-to-fixed-plus-variable-length coding, Journal of Systems Architecture 53 (11) (2007) 877–887. [28] L. Lei, K. Chakrabarty, Test data compression using dictionaries with fixed-length indices, in: Proceedings of the VLSI Test Symposium, 2003, pp. 219–224. [29] A. Al-Yamani, E. McCluskey, Seed encoding for LFSRs and cellular automata, in: Proceedings of the Design Automation Conference, 2003, pp. 560–565.

98

W. Zhan, A. El-Maleh / INTEGRATION, the VLSI journal 45 (2012) 91–98

[30] A. Jas, J. Ghosh-Dastidar, N.A. Touba, An efficient test vector compression scheme using selective Huffman coding, IEEE Transactions on ComputerAided Design 23 (6) (2003) 797–806. [31] A. Jas, N.A. Touba, Test vector decompression via cyclical scan chains and its application to testing core-based designs, in: Proceedings of the International Test Conference, 1998, pp. 458–464. [32] T. Paul, B. Al-Hashimi, N. Nicolici, Variable-Length input huffman coding for system-on-a-chip test, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 22 (6) (2003) 783–796. [33] A. Chandra, K. Chakrabarty, System-on-a-chip test data compression and decompression architectures based on Golomb codes, IEEE Transitions on ComputerAided Design of Integrated Circuits and System 20 (3) (2001) 355–368. [34] A. Chandra, K. Chakrabarty, Test data compression and test resource partitioning for system-on-a-chip using frequency-directed run-length (FDR) codes, IEEE Transactions on Computers 52 (8) (2003) 1076–1088. [35] A. Chandra, K. Chakrabarty, Reduction of SOC test data volume, scan power and testing time using alternating run-length codes, in: Proceedings of the IEEE/ACM Design Automation Conference, 2002, pp. 673–678. [36] A. Wuertenberger, C.S. Tautermann, S. Hellebrand, A hybrid coding strategy for optimized test data compression, in: Proceedings of the International Test Conference, 2003, pp. 451–459. [37] Aiman El-Maleh, Test data compression for system-on-a-chip using extended frequency-directed run-length code, IET Computers and Digital Techniques 2 (3) (2008) 155–163. [38] M. Tehranipoor, M. Nourani, K. Chakrabarty, Nine-coded compression technique for testing embedded cores in socs, IEEE Transactions on VLSI Systems 13 (6) (2005) 719–731. [39] Maoxiang Yi, Huaguo Liang, Lei Zhang, Wenfa Zhan, A novel x-ploiting strategy for improving performance of test data compression, IEEE Transactions on Very Large Scale Integration Systems 18 (2) (2010) 324–329. [40] N. Basturkmen, S. Rddy, I. Pomeranz, A low power pseudo-random BIST technique, in: Proceedings of the IEEE International Conference on On-line Testing Workshop, 2002, pp. 140–144. [41] A. Chandra, K. Chakrabarty, Test data compression for system-on-a-chip using Golomb codes, in: Proceedings of the VLSI Test Symposium, 2000, pp. 113–120. [42] P. Rosinger, P. Gonciari, B. Al-Hashimi, N. Nicolici, Simultaneous reduction in volume of test data and power dissipation for systems-on-a-chip, Electronics Letters 37 (24) (2001) 1434–1436. [43] S. Wang, S. Gupta, DS-LFSR: a new BIST TPG for low heat dissipation, in: Proceedings of the International Test Conference, 1997, pp. 848–857. [44] S. Gerstendorfer, H.J. Wunderlich, Minimized power consumption for scan-based BIST, in: Proceedings of the International Test Conference, 1999, pp. 77–84. [45] R. Sankaralingam, N.A. Touba, Inserting test points to control peak power during scan testing, in: Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, 2002, pp. 138–146. [46] P. Rosinger, B. Al-Hashimi, N. Nicolici, Scan architecture with mutually exclusive scan segment activation for shift and capture power reduction, IEEE Transactions on Computer-Aided Design Integration Circuits System 23 (7) (2004) 1142–1153. [47] O. Sinanoglu, A. Orailoglu, A novel scan architecture for power-efficient, rapid test, in: Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2002, pp. 299–303. [48] L. Whetset, Adapting scan architecture for low power operation, in: Proceedings of the International Test Conference, 2000, pp. 863–872.

[49] S. Kajihara, K. Ishida, K. Miyase, Test vector modification for power reduction during scan testing, in: Proceedings of the VLSI Test Symposium, 2002, pp. 160–165. [50] R. Sankaralingam, R. Oruganti, N.A. Touba, Static compaction techniques to control scan vector power dissipation, in: Proceedings of the VLSI Test Symposium, 2000, pp. 35–40. [51] P. Gonciari, B. Al-Hashimi, N. Nicolici, Synchronization overhead in SoC compressed test, IEEE Transactions on VLSI Systems 13 (1) (2005) 140–153. [52] F. Brglez, Combinational profiles of sequential benchmark circuits, in: Proceedings of the International Symposium on Circuits and Systems, 1989, pp. 1929–1934.

Wenfa Zhan received his Ph.D. in the School of Computer and Information Science of Hefei University of Technology, Anhui Province in 2009, and BS in VLSI from the School of Electric and Automation of Hefei University of Technology, Anhui Province in 2004. He is an Associate Professor in the Department of Educational Technology, Anqing Normal College, Anhui Province. His research interests include test data compression, ATPG algorithms, etc. He has published over 50 papers in refereed journals and conference proceedings and hold five Chinese patents.

Dr. Aiman El-Maleh is an Associate Professor in the Computer Engineering Department at King Fahd University of Petroleum & Minerals. He holds a Ph.D. in Electrical Engineering, with dean’s honor list, from McGill University, Canada, in 1995. He was a member of scientific staff with Mentor Graphics Corp., a leader in design automation, from 1995 to 1998. Dr. ElMaleh’s research interests are in the areas of synthesis, testing, and verification of digital systems. In addition, he has research interests in Defect-tolerant design, VLSI design, design automation, and error correcting codes. He has great interest in e-learning and taking advantage of instructional technology to enhance the learning experience of students. He has published over 60 papers in refereed journals and conference proceedings and hold one US patent. Dr. El-Maleh is the winner of the best paper award for the most outstanding contribution in the field of test for 1995 at the European Design & Test Conference. His paper presented at the 1995 Design Automation Conference was also nominated for best paper award. Dr. El-Maleh was a member of the program committee of the Design Automation and Test in Europe Conference (DATE’98, 2009, 2010). He is currently the editor of the Computer Science and Engineering subject of the Arabian Journal for Science & Engineering and is serving in the editorial board of the IET Proceedings: Computer and Digital Techniques and the editorial advisory board of Recent Patents on Electrical Engineering, Bentham Science Publishers.