Area-efficient high-coverage LBIST

Area-efficient high-coverage LBIST

MICPRO 2140 No. of Pages 7, Model 5G 16 May 2014 Microprocessors and Microsystems xxx (2014) xxx–xxx 1 Contents lists available at ScienceDirect M...

730KB Sizes 4 Downloads 54 Views

MICPRO 2140

No. of Pages 7, Model 5G

16 May 2014 Microprocessors and Microsystems xxx (2014) xxx–xxx 1

Contents lists available at ScienceDirect

Microprocessors and Microsystems journal homepage: www.elsevier.com/locate/micpro 4 5

Area-efficient high-coverage LBIST

3 6

Q1

7

Nan Li ⇑, Elena Dubrova Royal Institute of Technology, Stockholm, Sweden

8

a r t i c l e

1 2 0 0 11 12

i n f o

Article history: Available online xxxx

13 14 15 16 17 18 19

Keywords: LBIST LFSR Top-off test patterns In-field testing Test compression

a b s t r a c t Logic Built-In Self Test (LBIST) is a popular technique for applications requiring in-field testing of digital circuits. LBIST incorporates test generation and response-capture on-chip. It requires no interaction with a large, expensive tester. LBIST offers test time reduction due to at-speed test pattern application, makes possible test data re-usability at many levels, and enables test-ready IP. However, the traditional pseudorandom pattern-based LBIST often has a low test coverage. This paper presents a new method for on-chip generation of deterministic test patterns based on registers with non-linear update. Our experimental results on 7 real designs show that the presented approach can achieve a higher stuck-at coverage than the test point insertion with less area overhead. We also show that registers with non-linear update are asymptotically smaller than memories required to store the same test patterns in a compressed form. Ó 2014 Published by Elsevier B.V.

21 22 23 24 25 26 27 28 29 30 31 32

33 34

1. Introduction

35

Large test data volume is widely recognized as a major contributor to the testing cost of integrated circuits [1]. The test data volume in 2017 is expected to be 10 times larger than the one in 2012 [2]. On the contrary, the size of the Automatic Test Equipment (ATE) memory is expected to grow only twice [2]. A number of efficient on-chip test compression techniques have been proposed as a solution for reducing ATE memory requirements, including [1,3–6]. A test set for the circuit under test is compressed to a smaller set, which is stored in ATE memory. An on-chip decoder is used to generate the original test set from the compressed one during test application. Test compression has already established itself as a mainstream design-for-test methodology for manufacturing testing [6]. However, it cannot be used for in-field testing where ATE is not available [7]. For in-field testing of digital circuits, Logic Built-In Self Test (LBIST), in which test generation and response-capture are embedded on-chip, is used. In-field testing is particularly important for safety–critical applications such as medical, automotive, and military, which repeatedly test their systems to ensure their reliability. For example, many cars contain several electronic units that control engine, brakes, steering, airbags, etc. Every time a car is turned on, all units are self-tested. LBIST offers test cost reduction in terms of using smaller and cheaper ATE, test data volume reduction due to on-chip test

36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58

Q2 Q1

⇑ Corresponding author. E-mail addresses: [email protected] (N. Li), [email protected] (E. Dubrova).

pattern generation, test time reduction due to at-speed test pattern application. It makes possible test data re-usability at many levels and enables test-ready Intellectual Property (IP). However, the main drawback of LBIST is low test coverage. The test coverage achieved with pseudo-random patterns generated by a Linear Feedback Shift Register (LFSR) can be as low as 65% [8]. Several methods for increasing LBIST test coverage have been proposed, including modification of the circuit under test by inserting test points into the circuit [9–11], modification of the LFSR to generate a sequence with a different distribution of 0 s and 1 s [12], embedding of deterministic test patterns into LFSR’s patterns by LFSR re-seeding [13], bit-flipping [14] or bit-fixing [15], or storing them in an on-chip memory [16]. The idea of complementing pseudo-random patterns with deterministic patterns is particularly attractive because the deterministic patterns can also solve the problem with transition or delay faults which are not handled efficiently by the pseudo-random patterns. However, the area required to store deterministic test patterns within a system can be prohibitively high. For example, the memory required to store them may exceed 30% of the memory used in a conventional ATPG-based approach [17]. Non-Linear Feedback Shift Registers (NLFSRs) can also be used in pseudo-random pattern generation [18]. However, such an approach does not apply to scan designs, and is not scalable since the order of test patterns has to be carefully selected so they can be generated by a simple enough NLFSR. In this paper, we propose a new method for on-chip generation of deterministic test patterns suitable for complementing pseudorandom pattern-based LBIST techniques. We generate deterministic test patterns using Registers with Non-Linear Update (RNLUs)

http://dx.doi.org/10.1016/j.micpro.2014.05.002 0141-9331/Ó 2014 Published by Elsevier B.V.

Q1 Please cite this article in press as: N. Li, E. Dubrova, Area-efficient high-coverage LBIST, Microprocess. Microsyst. (2014), http://dx.doi.org/10.1016/ j.micpro.2014.05.002

59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88

MICPRO 2140

No. of Pages 7, Model 5G

16 May 2014 Q1 89 90 91 92 93 94 95 96 97 98 99

100 102

103 104 105 106

107 109

110 111

112

2

N. Li, E. Dubrova / Microprocessors and Microsystems xxx (2014) xxx–xxx

[19]. RNLUs can be considered as a more general type of Non-Linear Feedback Shift Registers (NLFSRs) [20]. Although NLFSRs can also be used to generated test patterns [18], the scalability of such methods are limited. The general structure of a k-stage RNLU with degree of parallelization p is shown in Fig. 1. Unlike NLFSRs, no chain connections are required among the stages (therefore no ‘‘shifts’’). Every stage is updated by its own feedback function. RNLUs are typically smaller and faster than NLFSRs generating the same sequence. For example, consider the 4-stage NLFSR with the feedback function

f ðx0 ; x1 ; x2 ; x3 Þ ¼ x0  x1  x1  x2  x2  x3 ; where ‘‘’’ is the exclusive-OR, ‘‘’’ is the AND, and xi is the state variable representing the value of the stage i, i 2 f0; 1; 2; 3g. If this NLFSR is initialized to the state ðx3 x2 x1 x0 ¼ ð0001Þ, it generates the following periodic output sequence:

ð1; 0; 0; 0; 1; 0; 0; 1; 1; 1; 1; 0; 1; 0; 1Þ: The same sequence can be generated by a 4-stage RNLU with the feedback functions

f3 ðx0 ; x2 ; x3 Þ ¼ x0  x2  x3 f2 ðx3 Þ ¼ x3 f1 ðx1 ; x2 Þ ¼ x1  x2 114

f0 ðx1 Þ ¼ x1 :

115

We can see that the RNLU uses 3 binary operations, while the NLFSR uses 5 binary operations. Furthermore, the depth of feedback functions of the RNLU is smaller that the depth of the feedback function of the NLFSR. Thus, the RNLU has a smaller propagation delay than the NLFSR. While RNLUs can potentially be smaller and faster than NLFSRs, the search space for finding a best RNLU generating a given sequence is considerably larger than the corresponding one for NLFSRs. Algorithms for constructing RNLUs were presented in [19,21,22]. It was shown in [19] that their algorithm constructs RNLUs which are asymptotically smaller than RNLUs constructed by the algorithms [21,22] and than LFSRs and NLFSRs generating the same sequence. In this paper, we use the algorithm [19] for constructing RNLUs for test patterns. The original version of the algorithm was developed for completely specified binary sequences. Since deterministic test patterns usually contains many don’t cares (up to 99%), we extend the algorithm [19] to handle incompletely specified binary sequences. Our experimental results on 7 real designs show that the presented approach can achieve a higher stuck-at coverage than the test point insertion with less area overhead. We also derive an expression for the expected size of RNLUs constructed by the presented algorithm and show that it is asymptotically smaller than the size of a memory required to store the same test data in a compressed form. For the data volumes larger than 1 Gbit, the difference in sizes between the two representations is one order of magnitude. The rest of the paper is organized as follows. Section 2 gives an introduction to RNLUs. Section 3 presents related work on embedding deterministic test patterns. Section 4 describes the new

116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143

Fig. 1. The general structure of a k-stage RNLU with degree of parallelization p.

method. Section 6 shows the experimental results. Section 7 concludes the paper and discusses open problems.

144

2. Preliminaries

146

An k-stage Register with Non-Linear Update (RNLU) (also called binary machine [21]) consists of k binary storage elements, called stages [19]. Each stage i 2 f0; 1; . . . ; k  1g has an associated state variable xi 2 f0; 1g which represents the current value of the stage i and a feedback function fi : f0; 1gk ! f0; 1g which determines how the value of xi is updated. A state of an RNLU is a vector of values of its state variables. At every clock cycle, the next state of an RNLU is determined from its current state by updating the values of all stages simultaneously to the values of the corresponding feedback functions. An k-stage RNLU has 2k states corresponding to the set f0; 1gk of all possible binary k-tuples. The degree of parallelization of an k-stage RNLU, p, is the number of output bits generated at each clock cycle, 1 6 p 6 k. Throughout the paper, we assume that p rightmost stages of RNLU are used for producing its output. The support set of a Boolean function f : f0; 1gk ! f0; 1g; supðf Þ, is a set of variables on which f depends:

147

supðf Þ ¼ fxi jf jxi ¼0 – f jxi ¼1 g;

167

where f jxi ¼j ¼ f ðx0 ; . . . ; xi1 ; j; xiþ1 ; . . . ; xk1 Þ, for j 2 f0; 1g. If a variable does not belong to the support set of f, it is called redundant.

168

3. Previous work

170

Two major approaches for embedding deterministic test patterns are LFSR reseeding and pattern mapping [23]. In LFSR reseeding schemes, deterministic test patterns are encoded into seeds, the state vectors of the LFSR. The encoding is accomplished by solving a system of linear equations. Successful encoding of a pattern into a seed is not guaranteed. However, Könemann proved that by selecting an LFSR of size Smax þ 20, the probability of encoding failure can be reduced to 1=106 [24]. The encoding efficiency can be increased by using variable-length seeds and multiple polynomials [25,26], or through partial dynamic reseeding technique [27]. The seeds can be stored in an on-chip Reed-Only Memory (ROM). An alternative approach is dynamically generating the seeds using a reseeding circuit [13]. The order of the seeds might affect the size of the reseeding circuit, or even the number of seeds. A seed ordering technique is presented in [28] that minimizes the hardware overhead. In pattern mapping approaches, a mapping function is put in between the LFSR and the circuit under test, to transform the pseudo-random patterns into deterministic patterns. In [29], the authors suggest the use of a counter combined with an XOR network to generate deterministic patterns on-chip. The on-chip test pattern generator presented in [30] is a combination of an LFSR, an OR gate network, and a set of multiplexers. The pattern mapping technique presented in [31] uses Generalized LFSRs (GLFSRs) as the random pattern generators, and the mapping function for each output is synthesized separately. GLFSRs are a generalization of LFSRs, where the Boolean AND and XOR operations are extended to addition and multiplication over GF ð2d Þ [32]. Another class of pattern mapping techniques includes bit-flipping [14] and bit-fixing [15] schemes. They exploit the fact that very few bits need to be altered in a carefully selected random pattern to make it deterministic. In bit-flipping approach, a Boolean function is implemented so that it evaluates to 1 whenever a flip of a bit is needed [14]. A bit-fixing logic generates signals indicating the bit should be fixed to 0, fixed to 1, or unchanged [15]. The random

171

Q1 Please cite this article in press as: N. Li, E. Dubrova, Area-efficient high-coverage LBIST, Microprocess. Microsyst. (2014), http://dx.doi.org/10.1016/ j.micpro.2014.05.002

145

148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164

165

169

172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205

MICPRO 2140

No. of Pages 7, Model 5G

16 May 2014 Q1

N. Li, E. Dubrova / Microprocessors and Microsystems xxx (2014) xxx–xxx

218

pattern generated by an LFSR is then altered according to the output of the bit-flipping or bit-fixing function to form the deterministic pattern. Dictionary based methods exploits the repetition in deterministic test patterns and stores such information in an onchip ROM to achieve a high coding efficiency [33]. All of the above methods take the advantage of fact that very small portion of bits in the deterministic patterns are specified. The deterministic patterns use the LFSR-generated pseudo-random patterns as ‘‘templates’’. The large search space for a suitable ‘‘template’’ might limit the efficiency of such algorithms. Additionally, the distinction between the LFSR, the ‘‘generator’’, and the reseeding or mapping circuit, the ‘‘modifier’’, potentially increases the area overhead compared to integrated solutions, such as RNLUs.

219

4. Proposed method

220

The problem of finding a best RNLU for a given sequence can be divided into three sub-problems:

206 207 208 209 210 211 212 213 214 215 216 217

221 222 223 224 225 226 227

1. Selecting an optimal degree of parallelization for a given binary sequence. 2. Choosing an optimal state assignment for a given degree of parallelization. 3. Finding a best circuit for feedback functions for a given state assignment.

228 229

4.1. Optimal degree of parallelization

230

The degree of parallelization determines how many output bits are generated per clock cycle. The size of RNLUs may differ substantially for different parallelization degrees. The degree of parallelization is optimal if it minimizes the size of the resulting RNLU. In order to construct an RNLU with the degree of parallelization p, we map a binary sequence into an 2p -ary sequence by partitioning the binary sequence into vectors of length p. The resulting vectors are treated as binary expansions of elements of an 2p -ary sequence. The same approach was used in [22]. Let us denote by N i the number of occurrences of a digit i in the 2p -ary sequence, 0 6 i < 2p . Let N max be the largest of N i . In [22], it was shown that the minimum number of stages in an RNLU generating a given binary sequence with the degree of parallelization p is equal to

231 232 233 234 235 236 237 238 239 240 241 242 243

244 246 247 248 249 250

251 253 254 255 256 257

258

k ¼ dlog2 Nmax e þ p:

ð1Þ

From (1) we can see that if N max ¼ 1, then k ¼ p. Such a case is called full parallelization. On the base of our experimental results, we hypothesize that the optimal degree of parallelization is in the interval

1 6 popt 6 dlog2 ne;

ð2Þ

where n is the sequence length. Note that for some applications, including testing, the degree of parallelization is specified by the user. For example, for test-perscan testing, it is equal to the number of scan chains. 4.2. Optimal state assignment

3

to make the current-to-next state mapping unique is dlog2 N max e. The minimal number of stages in the resulting RNLU is given by (1). The strategy for state assignment presented in [19] has two major differences from the one in [22]. First, in [19] a non-minimal number of stages is used, namely

  n þ p; k ¼ log2 p

260 261 262 263 264 265 266

A state assignment determines a sequence of states which an RNLU follows. Different sequences of states give raise to different current-to-next state mappings and, thus, to different feedback functions. The state assignment is optimal if it minimizes the size of the resulting RNLU. Since an RNLU is a deterministic finite state automaton, any of its states must have a unique next state. For a given 2p -ary encoding, the minimal number of bits which has to be added to p-tuples

ð3Þ

where n is the sequence length. Second, states are assigned so that the feedback functions implementing the current-to-next state mapping depend on the minimum number of state variables. It is known that ‘‘most’’ Boolean functions of k variables require a circuit of size Oð2k =kÞ gates to be computed (Shannon–Lupanov bound) [34]. Feedback functions of RNLUs are random functions. For random functions, actual size of their circuits is very close to the Shannon–Lupanov bound. Therefore, every additional variable nearly doubles the size of the circuit computing the function. It was shown in [19], that such a strategy for state assignment leads to RNLUs whose expected size is asymptotically smaller than the size of RNLUs constructed using previous algorithms. We assign states using the same strategy as in [19]. The original algorithm was developed for completely specified binary sequences. We extend it to handle incompletely specified binary sequences. The pseudocode is shown as Algorithm 1. The input of the algorithm is an incompletely specified binary sequence A ¼ ða0 ; a1 ; . . . ; an Þ, where ai 2 f0; 1; g, for i 2 f0; 1; . . . ; n  1g, and the desired degree of parallelization p. The output is a sequence S ¼ ðs0 ; s1 ; . . . ; sm1 Þ of binary vectors si ¼ ðsi;0 ; si;1 ; . . . ; si;pþr1 Þ 2 f0; 1gpþr , where m ¼ dn=pe and r ¼ dlog2 me, corresponding to the states of an ðp þ rÞ-stage RNLU which generates a completely specified version of A with the degree of parallelization p. The algorithm partitions A into p-tuples and appends at the beginning of each ith p-tuple r extra bits. These extra bits correspond to the binary expansion of the ith element of the permutation vector P. Next, we define a mapping si # siþ1 , for all i 2 f0; 1; . . . ; m  2g. Since P is a permutation, each state in the resulting sequence of states has a unique next state, so the mapping is well-defined. The last state sm1 and each of the 2pþr  m remaining states of the resulting binary ðp þ rÞ-stage machine are mapped to do not cares values. This gives us the possibility to specify the functions f0 ; f 1 ; . . . ; fpþr1 implementing the current-to-next state mapping in a way which minimizes their size. Since m 6 2r , we can treat them as functions depending on the first r variables only. This is very important, because, as mentioned above, for random functions, the size nearly doubles with each additional variable. Since, by construction, the first p bits of each state si in S ¼ ðs0 ; s1 ; . . . ; sm1 Þ correspond to the ith p-tuple of A, the resulting RNLU generates a completely specified version of A with the degree of parallelization p. As an example, let us construct an RNLU which generates the following 20-bit incompletely specified binary sequence with the degree of parallelization 2:

Since n ¼ 20 and p ¼ 2, we get m ¼ 10 and r ¼ 4. Suppose we use the following permutation of ð0; 1; . . . ; 15Þ:

P ¼ ð1; 8; 4; 2; 9; 12; 6; 11; 5; 10; 13; 14; 15; 7; 3; 0Þ; which is selected according to the sequence of states of an LFSR with the generator polynomial 1 þ x þ x4 . It is important to use permutations P which have a low-cost implementation. Examples of

Q1 Please cite this article in press as: N. Li, E. Dubrova, Area-efficient high-coverage LBIST, Microprocess. Microsyst. (2014), http://dx.doi.org/10.1016/ j.micpro.2014.05.002

268 269 270 271 272

273



A ¼ ð1; X; X; 1; X; X; 1; 1; X; 0; 1; 1; 0; X; 1; X; 0; 1; X; 0Þ: 259

267

275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322

323 325 326 327

328 330 331 332 333

MICPRO 2140

No. of Pages 7, Model 5G

16 May 2014 Q1 334 335 336

337

4

N. Li, E. Dubrova / Microprocessors and Microsystems xxx (2014) xxx–xxx

such permutations are sequences of states generated by counters, LFSRs, or NLFSRs with simple feedback functions [35]. Then, we get the following sequence of states:

S ¼ ð00011X; 1000X1; 0100XX; 001011; 1001X0; 110011; 01100X; 339

10111X; 010101; 1010X0Þ

340 341 342 343

Algorithm 1. Constructs an RNLU with the degree of parallelization p for an incompletely specified binary sequence A ¼ ða0 ; a1 ; . . . ; an1 Þ; ai 2 f0; 1; g, for i 2 f0; 1; . . . ; n  1g. 1: m :¼ dn=pe 2: r :¼ dlog2 me 3: P :¼ ðp0 ; p1 ; . . . ; p2r 1 Þ is a permutation of ð0; 1; . . . ; 2r  1Þ 4: Let pi;j be the jth element of the binary expansion of pi ; j 2 f0; . . . ; r  1g 5: for every i from 0 to m  1 do 6: for every j from 0 to p  1 do 7: si;j :¼ aipþj 8: end for 9: for every k from 0 to r  1 do 10: si;pþk :¼ pi;k 11: end for 12: si :¼ ðsi;0 ; si;1 ; . . . ; si;pþr1 Þ 13: end for 14: Return S ¼ ðs0 ; s1 ; . . . ; sm1 Þ

4.3. Best circuit for feedback functions

487

The problem of finding a best circuit for a given Boolean function is known to be notoriously hard. The exact solutions are known only for up to five variable functions [36]. However, there are many powerful heuristic algorithms for multi-level circuit optimization which are capable of finding good circuits for larger functions [37]. In our experimental results, we optimize feedback functions using ABC logic synthesis tool [38]. We observe that, even for random functions, ABC is capable of reducing the size of the original, non-optimized circuit about 30% on average.

488

5. Expected size analysis

497

It was shown in [19] that, if their algorithm is used to construct an RNLU generating a given completely specified binary sequence, then, for sufficiently large random sequences, the expected size of the RNLU is Oðn=log2 ðn=pÞÞ. This bound is smaller than the lower bound OðnÞ on the expected size of the RNLU constructed by the algorithm [22] as well as the bounds OðpnÞ and Oðpn2 =log2 nÞ on the expected size of LFSR and NLFSR generating the same sequence, respectively. In this section, we extend the expected size analysis to incompletely specified sequences. In 1969, Sholomov [39] has derived a lower bound on the circuit size of incompletely specified Boolean functions. He has shown that, if the number of specified rows in the truth table of a function f : f0; 1gk ! f0; 1; g is at least

498

361

363

The functions implementing the resulting current-to-next state mapping have the following defining table: x5

x4

x3

x2

f5

f4

f3

f2

f1

f0

0 1 0 0 1 1 0 1 0 1

0 0 1 0 0 1 1 0 1 0

0 0 0 1 0 0 1 1 0 1

1 0 0 0 1 0 0 1 1 0

1 0 0 1 1 0 1 0 1 –

0 1 0 0 1 1 0 1 0 –

0 0 1 0 0 1 1 0 1 –

0 0 0 1 0 0 1 1 0 –

– – 1 – 1 0 1 0 – –

1 – 1 0 1 – – 1 0 –

475 476 477 478 479

480

where ‘‘–’’ stands for a do not care value. Recall that the functions depend of the four variables x5 ; x4 ; x3 ; x2 only. The remaining 6 input assignments are mapped to do not cares. We can implement the above functions as:

f 5 ¼ x2  x 3 f 4 ¼ x5 f 3 ¼ x4 f 2 ¼ x3 f1 ¼ ððx3 þ x4 Þ  x5 Þ0 x02 x4

x03 x04

482

f0 ¼

483

where ‘‘+’’ stands for a Boolean OR, and ‘‘0’’ stands for a Boolean complement. Note that functions f5 ; f 4 ; f 3 , and f2 are in such simple form because they are actually the updating functions of the LFSR that generates permutation P.

484 485 486

þ

þ x5

1þd klog2 k

491 492 493 494 495 496

499 500 501 502 503 504 505 506 507 508 509 510 511

ð4Þ

514

for some positive constant d, then most Boolean functions with the support set k require a circuit with at least

515 516

517

C k;Nk  q

Nk log2 Nk

ð5Þ

gates to be computed, where q is a constant depending on the types of gates allowed in the circuit. If only gates with at most two inputs are allowed, then q ¼ 1. It the sequel, we consider this case and hence omit q. Bowen [40] has derived an upper bound on the circuit size of incompletely specified Boolean functions and has shown that it matches the lower bound (5). Therefore, we can conclude that most incompletely specified Boolean functions with the support set k require a circuit of size C k;Nk to be computed, where C k;Nk is given by (5). Let 0 6 s 6 1 be a fraction of specified bit positions in the truth table of a Boolean function with the support set k. Then, N k ¼ s  2k and the Eq. (5) can be re-written as

519 520 521 522 523 524 525 526 527 528 529 530 531 532

533

C k;s

s2k  : k þ log2 s

ð6Þ

We use this bound in the analysis below. We assume that one storage element counts as b gates. Since our analysis is asymptotic, without the loss of precision we use log2 n instead of dlog2 ne. Let A be an incompletely specified binary sequence of length n which has sn specified bits, 0 6 s 6 1. Suppose that the position of every specified bit is selected independently and uniformly at random from n possible bit positions and the value of every specified bit is selected equiprobably from f0; 1g. Throughout this section, we call such a sequence an incompletely specified random sequence. Suppose that Algorithm 1 is used to construct an RNLU generating A with the degree of parallelization p. Let m ¼ dn=pe. Then this RNLU has:

Q1 Please cite this article in press as: N. Li, E. Dubrova, Area-efficient high-coverage LBIST, Microprocess. Microsyst. (2014), http://dx.doi.org/10.1016/ j.micpro.2014.05.002

490

512

Nk ¼

362

489

535 536 537 538 539 540 541 542 543 544 545 546 547 548

MICPRO 2140

No. of Pages 7, Model 5G

16 May 2014 Q1

5

N. Li, E. Dubrova / Microprocessors and Microsystems xxx (2014) xxx–xxx Table 1 Comparison of the presented technique (RNLU) to Test Point Insertion technique (TPI). All designs are configured in 128 scan chains. Benchmark

bio_a bio_b bio_bk bio_c bio_d leon3_8core netcard

# Gates

34,113 22,104 21,984 39,277 31,727 841,347 145,249

# Scan cells

2511 1726 1727 3022 2690 50,159 10,962

Average a

549 550 551 552

   

LBIST only

LBIST + TPI

Test coverage (%)

Area overhead (%)

Test coverage (%)

Area overhead (%)

Test coverage (%)

Area overhead (%)

Test coverage (%)

Area overhead (%)

94.43 91.62 91.87 96.29 93.28 90.83 92.67

0.67 1.03 1.04 0.58 0.72 0.03 0.16

95.32 93.56 93.12 96.94 94.32 92.52 95.61

4.47 11.01 10.82 3.71 7.51 0.84 1.18

99.98 99.98 99.98 99.99 99.98 99.67 98.86

13.30 20.72 20.79 8.44 14.88 11.69 11.52

99.31 98.50 98.44 99.68 99.09 95.88 97.19

4.82 4.92 4.94 4.99 4.97 4.74 4.87

93.00

0.60

94.48

5.65

99.78

14.48

98.30

4.89

RNLU-S refers to an RNLU constructed for a subset of top-off patterns.

p stages for the output bits, log2 m stages for extra bits, log2 m updating functions of the extra bits, p updating functions of the output bits.

553 554 555 556 557 558

LBIST + RNLU-Sa

LBIST + RNLU

The updating functions of the extra bits can be computed by a circuit of size Oðlog2 mÞ by using an ðlog2 mÞ-stage LFSR or a ðlog2 mÞ-bit counter. The following Lemma estimates the expected size of the support set of the updating functions of output bits.

Since variables are independent, the size of the support set of f ; jsupðf Þj, follows the binomial distribution with parameters k and Pðxi 2 supðf ÞÞ:

560 561 562 563

Lemma 1. Let f : f0; 1gk ! f0; 1; g be an incompletely specified Boolean function such that each a row in its truth table is specified independently with the probability s and the value of every specified row is selected equiprobably from f0; 1g. Then, the probability that the ~ for 0 6 k ~ 6 k is support set of f is of size k

564 566

~ ¼ Pðjsupðf Þj ¼ kÞ

  ~ ~ k k kk ~ ðPðxi 2 supðf ÞÞÞ ð1  Pðxi 2 supðf ÞÞÞ k

~ is Therefore, the probability that the support set size of f is k

598

  ~ ¼ k ðPðx 2 supðf ÞÞÞk~ ð1  Pðx 2 supðf ÞÞÞkk~ Pðjsupðf Þj ¼ kÞ i i ~ k

599 601 602

Lemma 1 shows that, for any fixed s, the probability that the updating functions do not have redundant variables quickly approaches 1 as k grows. So, for sufficiently large random sequences, the expected size of the support set of each of p updating functions of output bits is log2 m. By (6), a log2 m-variable function requires a circuit of size logmsðmsÞ to be computed, given that the 2 condition (4) holds, i.e. the fraction of specified bits is

570

 2k1 1 Pðxi 2 supðf ÞÞ ¼ 1  1  s2 : 2

Under these conditions, the expected size of the RNLU constructed by the presented algorithm is

Proof. Any row in the truth table may have one of the values in f0; 1; g with the following probabilities:

En;s;p  bðp þ log2 mÞ þ pms=log2 ðmsÞ þ Oðlog2 mÞ ns Þ:  Oðns=log2 ðns=pÞÞ  Oð log2 n þ log2 s  log p

573

575

8 > < 0 : s=2 1 : s=2 > :  : 1s

583

We say that two rows of the truth table are incompatible if one of them is 0 and the other is 1. Otherwise they are compatible. Then, the probability that any two rows in the truth table are incompatible is s2 =2. A variable xi is redundant if and only if each row of sub-function f jxi ¼0 is compatible with the corresponding row of f jxi ¼1 . Since there are 2k1 rows in f jxi ¼0 and in f jxi ¼1 , the probability that xi is redundant is

586

 2k1 s2 Pðxi R supðf ÞÞ ¼ 1  : 2

576 577 578 579 580 581 582

584

587 588

589

591

Then, the probability that a variable is in the support set of f can be calculated as

 2k1 s2 Pðxi 2 supðf ÞÞ ¼ 1  1  : 2

605 606 607 608 609

612 613 614

615

ð7Þ

617

It is interesting to compare the size of RNLUs to the size of a memory required to store compressed test data. It is believed that the size of compressed test data cannot be smaller than the number of specified bits in the data [41]. So, even if a test compression algorithm achieves optimal results, the number of bits in the compressed test data is not smaller than sn, where n is the size of original test data and s is the fraction of specified bits. Thus, we need an sn-bit memory plus a decompressing logic to generate the original test data. By comparing to (7), we can conclude that the memory is by factor of log2 ðns=pÞ (i.e. asymptotically) larger than the RNLU. For example, for s ¼ 0:01; n > 1 GBit and p < 10,000, the number of gates in the RNLUs is an order of magnitude smaller than the number of cells in the memory.

618

6. Experimental results

631

We compared the presented algorithms to the Test Point Insertion (TPI) technique in a commercial synthesis tool.1

632

1 We cannot reveal the names of the commercial tools due to non-disclosure agreement.

Q1 Please cite this article in press as: N. Li, E. Dubrova, Area-efficient high-coverage LBIST, Microprocess. Microsyst. (2014), http://dx.doi.org/10.1016/ j.micpro.2014.05.002

604

610

where

572

603

1þd

567

571

594

595 597

log2 m  log2 ðlog2 mÞ sP : m

568

593

jsupðf Þj  Bðk; Pðxi 2 supðf ÞÞÞ:

h 559

592

619 620 621 622 623 624 625 626 627 628 629 630

633

MICPRO 2140

No. of Pages 7, Model 5G

16 May 2014 Q1

6

N. Li, E. Dubrova / Microprocessors and Microsystems xxx (2014) xxx–xxx

666

We applied both methods to 7 full-scan real designs of size 21– 841 K gates and 1.7–50 K scan cells (see columns 1–3 of Table 1 for parameters). The first 5 designs are 8-channel signal processor circuits [42]; leon3_8core and netcard are recompiled versions of publicly available benchmarks [43]. First, we applied to each design 7000 pseudo-random patterns and computed the resulting test coverage2 for stuck-at faults. The number of LFSR patterns applied is roughly the critical point where each pseudo-random pattern increases 106 of test coverage of stuck-at faults on average for these designs. Columns 4 and 5 shows that the test coverage is 93.00% on average. The resulting area overhead is 0.60% on average. Second, we inserted TPIs and applied the same number of pseudo-random patterns to the TPI-inserted designs. Columns 6 and 7 show that, on average, the test coverage is increased to 94.48% with 5.65% area overhead. Then, we used a commercial ATPG tool to generate deterministic top-off patterns required to reach the maximum achievable test coverage for stuck-at faults. We applied the presented algorithm to construct an RNLU for these test patterns. As we can see from columns 8 and 9, on average, the test coverage of 99.78% can be achieved with 14.48% area overhead. Finally, we investigated how much we can increase test coverage if the area overhead is bounded by 5%. We constructed an RNLU for a selected subset of the top-off patterns. The subset was selected using a simple greedy algorithm which ranks patterns based on their number of do not care bits and number of faults they cover. Column 10 shows that the resulting test coverage is 98.30% on average. As we can see, the presented RNLU-based method appears more suitable than TPI for complementing pseudo-random patterns. It can achieve a higher test coverage than TPI (98.30% vs 94.48%) with less area overhead (4.89% vs 5.65%).

667

7. Conclusion

668

689

We presented a new method for embedding deterministic test patterns on-chip based on RNLUs. The presented approach is particularly suitable for test patterns with many don’t cares. Our experimental results on 7 real designs show that it can achieve a higher test coverage than test point insertion method with less area overhead. We believe that RNLUs constructed by the presented approach are quite close to optimal. What can be improved in the proposed method is the strategy for selecting a subset of top-off patterns which maximizes the test coverage and minimizes the area overhead. At present, we use a simple greedy algorithm. A more sophisticated approach is likely to bring better results. RNLUs can be used not only for generating top-off patterns, but also for generating compressed test patterns on-chip. This would eliminate the dependence of test compression on ATE memory. As we have shown, the size of an RNLU is asymptotically smaller than the size of a memory required to store the same set of compressed test patterns. For the data volumes larger than 1 Gbit, the difference in sizes between the two representations is one order of magnitude. Future work includes testing the presented method on larger industrial designs.

690

Acknowledgments

691

This work was supported in part by the Project No 2011-03336 from Swedish Governmental Agency for Innovation Systems

634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665

669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688

692

2 Test coverage is defined with respect to detectable faults only, while fault coverage is defined with respect to all faults.

(VINNOVA) and in part by the research Grant No 621-2010-4388 from the Swedish Research Council.

693

References

695

[1] Z. Wang, K. Chakrabarty, Test data compression for IP embedded cores using selective encoding of scan slices, in: Proceedings of International Test Conference (ITC’2005), 2005, pp. 581–590. http://dx.doi.org/10.1109/ TEST.2005.1584019. [2] ITRS, International Technology Roadmap for Semiconductors, 2011. [3] B. Koenemann, C. Barnhart, B. Keller, T. Snethen, O. Farnsworth, D. Wheater, A smartbist variant with guaranteed encoding, in: Proceedings of Asian Test Symposium (ATE’2001), 2001, pp. 325–330. http://dx.doi.org/10.1109/ ATS.2001.990304. [4] J. Rajski, J. Tyszer, M. Kassab, N. Mukherjee, Embedded deterministic test, IEEE Trans. Comput.-Aided Des. Integr. Circu. Syst. 23 (2004) 776–792. [5] S. Mitra, K.S. Kim, X-compact: an efficient response compaction technique, IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 23 (2004) 421–432. [6] D. Czysz, G. Mrugalski, N. Mukherjee, J. Rajski, P. Szczerbicki, J. Tyszer, Deterministic clustering of incompatible test cubes for higher power-aware EDT compression, IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 30 (2011) 1225–1238. [7] M. Majeed, D. Ahlstrom, U. Ingelsson, G. Carlsson, E. Larsson, Efficient embedding of deterministic test data, in: Proceedings of Asian Test Symposium (ATS’2010), 2010, pp. 159–162. http://dx.doi.org/10.1109/ ATS.2010.36. [8] D. Das, N. Touba, Reducing test data volume using external/LBIST hybrid test patterns, in: Proceedings of International Test Conference (ITC’2000), 2000, pp. 115–122. http://dx.doi.org/10.1109/TEST.2000.894198. [9] E.B. Eichelberger, E. Lindbloom, Random-pattern coverage enhancement and diagnosis for LSSD logic self-test, IBM J. Res. Dev. 27 (1983) 265–272. [10] N. Touba, E. McCluskey, Test point insertion based on path tracing, in: Proceedings of 14th VLSI Test Symposium, 1996, pp. 2–8. http://dx.doi.org/ 10.1109/VTEST.1996.510828. [11] N. Tamarapalli, J. Rajski, Constructive multi-phase test point insertion for scanbased BIST, in: Proceedings, International Test Conference, 1996, pp. 649–658. http://dx.doi.org/10.1109/TEST.1996.557122. [12] C. Chin, E.J. McCluskey, Weighted Pattern Generation For Built-In Self Test, Technical Report TR – 84-7, Stanford Center for Reliable Computing, 1984. [13] A. Al-Yamani, E. McCluskey, Built-in reseeding for serial bist, in: Proceedings, 21st, VLSI Test Symposium, 2003, pp. 63–68. http://dx.doi.org/10.1109/ VTEST.2003.1197634. [14] H.-J. Wunderlich, G. Kiefer, Bit-flipping BIST, in: Proc. of IEEE/ACM International Conference on Computer-Aided Design (ICCAD’1996), San Jose, CA, USA, 1996, pp. 337–343. [15] N. Touba, E. McCluskey, Altering a pseudo-random bit sequence for scan-based BIST, in: Proceedings, International Test Conference, 1996, pp. 167–175. http:// dx.doi.org/10.1109/TEST.1996.556959. [16] J. Savir, G.S. Ditlow, P.H. Bardell, Random pattern testability, IEEE Trans. Comput. C-33 (1984) 79–90. [17] G. Hetherington, T. Fryars, N. Tamarapalli, M. Kassab, A. Hassan, J. Rajski, Logic BISTfor large industrial designs: real issues and case studies, in: Proceedings of International Test Conference (ITC’1999), 1999, pp. 358 – 367. http:// dx.doi.org/10.1109/TEST.1999.805650. [18] W. Daehn, J. Mucha, Hardware test pattern generation for built-in testing, in: Proceedings of International Test Conference (ITC’1981), 1981, pp. 110–120. [19] N. Li, E. Dubrova, An Algorithm for Constructing a Smallest Register with NonLinear Update Generating a Given Binary Sequence, Technical Report, 2013. http://arxiv.org/abs/1306.5596. [20] C. J. Jansen, Investigations on Nonlinear Streamcipher Systems: Construction and Evaluation Methods, Ph.D. Thesis, Technical University of Delft, 1989. [21] E. Dubrova, Synthesis of binary machines, IEEE Trans. Inform. Theory 57 (2011) 6890–6893. [22] E. Dubrova, Synthesis of parallel binary machines, in: Proceedings of International Conference of Computer-Aided Design (ICCAD’2011), San Jose, CA, USA, 2011b, pp. 200–206. [23] H.-J. Wunderlich, BIST for systems-on-a-chip, Integr., VLSI J. 26 (1998) 55–78. [24] B. Könemann, LFSR-coded test patterns for scan designs, in: Proc. IEE European Test Conference, 1991, pp. 237–242. [25] S. Hellebrand, S. Tarnick, J. Rajski, B. Courtois, Generation of vector patterns through reseeding of multiple-polynomial linear feedback shift registers, in: Proc. of International Test Conference, 1992, pp. 120–129. [26] J. Rajski, J. Tyszer, N. Zacharia, Test data decompression for multiple scan designs with boundary scan, IEEE Trans. Comput. 47 (1998) 1188–1200. [27] C. Krishna, A. Jas, N. Touba, Test vector encoding using partial LFSR reseeding, in: Proceedings, International Test Conference, 2001, pp. 885 – 893. http:// dx.doi.org/10.1109/TEST.2001.966711. [28] A. Al-Yamani, S. Mitra, E. McCluskey, Bist reseeding with very few seeds, in: Proceedings, 21st VLSI Test Symposium, 2003, pp. 69–74. http://dx.doi.org/ 10.1109/VTEST.2003.1197635. [29] S. Akers, W. Jansz, Test set embedding in a built-in self-test environment, in: Proceedings, Meeting the Tests of Time, International Test Conference, 1989, pp. 257–263. http://dx.doi.org/10.1109/TEST.1989.82306. [30] C. Dufaza, C. Chevalier, Lfsrom: basic principle and BIST application, in: Proceedings, [4th] European Conference on Design Automation, 1993, with the

Q1 Please cite this article in press as: N. Li, E. Dubrova, Area-efficient high-coverage LBIST, Microprocess. Microsyst. (2014), http://dx.doi.org/10.1016/ j.micpro.2014.05.002

694

Q3

696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775

MICPRO 2140

No. of Pages 7, Model 5G

16 May 2014 Q1 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811

N. Li, E. Dubrova / Microprocessors and Microsystems xxx (2014) xxx–xxx

[31]

[32]

[33]

[34] [35] [36] [37]

[38]

[39] [40]

[41] [42]

[43]

European Event in ASIC Design, 1993, pp. 211–216. http://dx.doi.org/10.1109/ EDAC.1993.386474. M. Chatterjee, D. Pradhan, A novel pattern generator for near-perfect faultcoverage, in: Proceedings, 13th IEEE VLSI Test Symposium, 1995, pp. 417–425. http://dx.doi.org/10.1109/VTEST.1995.512669. S. Gupta, D. Pradhan, A new framework for designing and analyzing BIST techniques: computation of exact aliasing probability, in: Proceedings, New Frontiers in Testing, International Test Conference, 1988, pp. 329–342. http:// dx.doi.org/10.1109/TEST.1988.207819. A.-W. Hakmi, S. Holst, H. Wunderlich, J. Schloffel, F. Hapke, A. Glowatz, Restrict encoding for mixed-mode bist, in: VTS ’09, 27th IEEE VLSI Test Symposium, 2009, pp. 179–184. http://dx.doi.org/10.1109/VTS.2009.43. I. Wegener, The Complexity of Boolean Functions, John Wiley and Sons Ltd., Stuttgart, 1987. E. Dubrova, A List of Maximum-Period NLFSRs, 2012. Eprint.iacr.org/2012/166. D.E. Knuth, The Art of Computer Programming: Boxed Set, vols. 1–3, AddisonWesley Longman Publishing Co., Inc., Boston, MA, USA, 1998. R.K. Brayton, C. McMullen, G. Hatchel, A. Sangiovanni-Vincentelli, Logic Minimization Algorithms For VLSI Synthesis, Kluwer Academic Publishers., 1984. Berkeley Logic Synthesis and Verification Group, ABC: A System for Sequential Synthesis and Verification, Release 70930, 2007. . L.A. Sholomov, On the realization of incompletely-defined boolean functions by circuits of functional elements, Trans. Syst. Theory Res. 21 (1969) 211–223. R.S. Bowen, Minimal Circuits for Very Incompletely Specified Boolean Functions, Technical Report, Dept. of Mathematics, Harvey Mudd Colledge, 2010. T. Williams, The limits of compression, in: Proceedings of International Test Conference (ITC’2008), 2008, pp. 1024–1025. V. Pesonen, M. Gorev, P. Annus, M. Min, P. Ellervee, Reconfigurable data acquisition unit for bioimpedance measurements, in: 12th Biennial Baltic Electronics Conference (BEC), 2010, pp. 257–260. http://dx.doi.org/10.1109/ BEC.2010.5630891. Aeroflex Gaisler, GRLIB IP Library User’s Manual, Version 1.1.0 b4113, 2012. .

7

Nan Li received the B.Sc. degree in Microelectronics from Fudan University, China, in 2009, and the M.Sc. degree in System-on-Chip Design from Royal Institute of Technology, Sweden, in 2011. Currently he is pursuing the Ph.D. degree in Electronic Systems at the School of Information and Communication Technology at Royal Institute of Technology, Sweden. His research interests include logic synthesis, binary sequence generation, and built-in self test.

814 Elena Dubrova received the Diploma Engineer degree in Computer Science from the Technical University of Sofia, Bulgaria, in 1993, and the Ph.D. degree in Computer Science from University of Victoria, BC, Canada, in 1997. Currently she is a professor in Electronic System Design at the School of Information and Communication Technology at Royal Institute of Technology, Stockholm, Sweden. She held visiting appointments at the University of New South Wales, Sydney, in 2002, the University of California at Berkeley in 2003, and the University of Queensland in 2005. She has authored over 100 publications in the area of electronic system design. Major contributions include new algorithmic techniques for Boolean decomposition, FPGA technology mapping, and probabilistic verification. Her work has been awarded prestigious prizes such as IBM faculty partnership award for outstanding contributions to IBM research and development. Her current research interests include logic synthesis, testing, fault-tolerant computing, formal verification, and cryptography.

812

Q1 Please cite this article in press as: N. Li, E. Dubrova, Area-efficient high-coverage LBIST, Microprocess. Microsyst. (2014), http://dx.doi.org/10.1016/ j.micpro.2014.05.002

815 816 817 818 819 820 821 822 823 824

827 828 829 830 831 832 833 834 835 836 837 838 839 840 826 841 842 843 844 845