Microelectronics Reliability 43 (2003) 685–693 www.elsevier.com/locate/microrel
Introductory Invited Paper
Integrating testability with design space exploration M. Zwolinski *, M.S. Gaur Electronic System Design Group, Department of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, UK Received 13 January 2003
Abstract Built-in self-test (BIST) has emerged as a promising test solution for high-speed, deep sub-micron VLSI circuits. Traditionally, the testability insertion phase comes after functional logic synthesis and verification in the design cycle. This creates two separate optimisation processes: functional optimisation followed by BIST insertion and optimisation. The first deals with functional design behaviour, while the second deals with test behaviour. Considering testability at such a late stage in the design flow limits efficient design space exploration. In this paper, we consider testability as a design objective alongside area and delay. We extend the concept of design space to include testability and show how this enhanced design space can be used by a high-level synthesis tool. We demonstrate that by taking testability into account at an early stage, we can generate better designs than by leaving BIST insertion to the end of the design cycle. 2003 Elsevier Science Ltd. All rights reserved.
1. Introduction High-level synthesis tools allow the designers of digital systems to automate the steps between specification and RTL design and synthesis. In conventional design methodologies, a high-level specification (written in a language such as C) is transformed by hand into an RTL-synthesisable representation in VHDL or Verilog. This manual transformation can be very time-consuming (and error-prone). It is therefore generally only feasible to produce one RTL implementation of the design. Once the RTL design has been produced, test structures can be introduced [1–4]. These test structures can include built-in self-test (BIST) [5–8]. The introduction of such structures is likely to degrade the performance of the design [9,10], but either this performance degradation must be accepted, or the test structures must be omitted with resultant reduction in testability. For any high-level design, there may be many possible RTL implementations, each with its own characteristics. For example, one might generate a large, fast imple-
*
Corresponding author. Fax: +44-23-8059-2901. E-mail address:
[email protected] (M. Zwolinski).
mentation, or a small, slow implementation, or something in between. It is conceivable that different choices of RTL implementation will have different consequences for testability insertion. There is generally insufficient time to explore these possibilities. Design space exploration tools allow many different RTL design possibilities to be examined. A high-level specification may be refined into an RTL implementation, subject to constraints such as area and delay. The choice of constraints and the weight given to each will influence the final implementation. Hence, changing the constraints and weighting allows a large number of possible designs to be considered, in a short time. In this paper, we consider the inclusion of testability as a further design constraint. In particular we look at the inclusion of BIST structures. Such structures affect the area and performance of the final circuit under ‘‘normal’’ operating conditions. Therefore, by considering the effect of these structures on the final design and conversely the effect of particular design decisions on the testability. For example, consider the data flow graph (DFG) of Fig. 1(a). Four additions are performed over three clock cycles. Because two operations are scheduled in the first clock cycle, two adders are needed. The remaining two additions may be allocated to either adder. Fig. 1(b) and
0026-2714/03/$ - see front matter 2003 Elsevier Science Ltd. All rights reserved. doi:10.1016/S0026-2714(03)00034-9
686
M. Zwolinski, M.S. Gaur / Microelectronics Reliability 43 (2003) 685–693 A
B C
D C
+1
A
+2
D
B
D
E
+1
+3 C
+2,+3,+4
F E, F +4
(b)
A
(a) B
C
A
+1, +3
D
+2, +4 E
(c) Fig. 1. Example data flow graph with two allocations. (a) Scheduled data flow graph; (b) first allocation for DFG (a); (c) second allocation for DFG (a).
(c) show two possible implementations of this DFG. In Fig. 1(b) both remaining additions are allocated to the same adder. In Fig. 1(c), each operation is allocated to a different adder. In terms of area and delay, the two implementations are very similar. For BIST, however, there are significant differences. Fig. 2(a) shows the test structures needed for the first implementation: three registers need to be configured as pseudo-random pattern generators (PRPG); one register as a multiple input signature register (MISR); and one register as a CBILBO. In Fig. 2(b), three registers are configured as PRPGs and one as an MISR. This saves a CBILBO (which is intrinsically expensive, as registers need to be duplicated). On the other hand, the test for the circuit of Fig. 2(a) can be performed in one session, while the circuit of Fig. 2(b) requires two test sessions (and hence a more complex test controller). This example highlights two important points. First, the choice of implementation can have a significant effect on the test structures included in a design and second, several weighted metrics are needed to describe the test resources. In Section 2, we describe the existing synthesis tool and the enhanced design space. In Section 3, we discuss the new design flow. Some results are presented in Section 4.
2. High-level synthesis High-level synthesis raises the level of abstraction at which a designer works to an algorithmic level. Rather than generating just one implementation from a VHDL or Verilog description, many (possibly millions) of different implementations can be examined while searching for an optimum design. Behavioural synthesis and optimisation therefore gives us a large design space to explore. In other words, a behavioural description can result in many suitable structural designs (at RTL). In this work, we have used an existing high-level synthesis tool––multiple objective optimisation in data and control path synthesis (MOODS).
2.1. Multiple objective optimisation The MOODS system generates alternative designs by exploring the design space using a transform-based approach [11]. In the design process, the user specifies the target speed and area of the optimised design. (It is also possible to define a target power dissipation, although the optimisation methodology differs slightly
M. Zwolinski, M.S. Gaur / Microelectronics Reliability 43 (2003) 685–693
A
B
CBILBO
C
D
PRPG
PRPG
687
PRPG
+1
+2,+3,+4 MISR E, F
(a)
C PRPG
D=0
+2, +4
C
D
PRPG
PRPG
+2, +4 B
E
PRPG
E
+1, +3
B=0
+1, +3
A
A
MISR
MISR
Test Session II
Test Session I
(b) Fig. 2. Self-test configurations for the allocations of Fig. 1. (a) Single test session for allocation of Fig. 1(b); (b) two test sessions for allocation of Fig. 1(c).
and is hence outside the scope of this paper.) Although MOODS cannot always deliver designs conforming to exactly those specifications, it does allow the user to explore what is possible in terms of tradeoffs. Fig. 3 gives an overview of the MOODS synthesis process. The behavioural description of the design is converted to an intermediate code (ICODE) format and this ICODE description maps to control and data flow graphs (CDFG). The CDFG is optimised by the iterative application of graph transforms. The optimisation (or search of the design space) is therefore incremental. Each transform is complete in that it changes the design in such a way that a correct and implementable design will result regardless of any previous transfor-
mations. In other words, a correct (but sub-optimal) RTL design may be generated at any point. Hence, each transform moves the design to a new point in design space. Any given transform is not targeted towards a particular optimisation criterion improvement and may, in fact, degrade some criterion. Each transform has a local effect on the design but creates a global perturbation in the cost function. Every transformation can produce an improvement or degradation to the design. Each transform consists of four distinct steps: • Select a transform and the design data on which to apply it.
688
M. Zwolinski, M.S. Gaur / Microelectronics Reliability 43 (2003) 685–693
Behavioural VHDL
User Optimisation Objectives
Source Optimiser
VHDL to Intermediate Code compiler
Behavioural Level
VHDL Libraries (work, std, ieee, moods….)
Tech. Depedent
Icode file
RT Level
Function Library
the change in C1 and C2 . SA also allows the design to escape from a local minimum, by randomly permitting a degradation in the overall cost. As each transform is applied, the design moves to a new point in the design space. The iterative optimisation process can therefore be thought of as a trajectory through design space. By changing the weightings in the cost function, or by simply rerunning the simulated annealing with a different random number seed, a new trajectory will result. Different trajectories may end at different points.
Module Libraries User Objectives
2.2. Unified design space
Synthesis & Optimisation Expanded Module Templates
Structural VHDL
Data Structure File
Analysis Tools
Logic Synthesis Placement & Route FPGA Mapping
Report files
GUI Tools
Design Statistics
FPGA
ASIC
Fig. 3. MOODS high level synthesis system.
• Test whether the selection would preserve the overall behaviour. • Estimate whether the transform improves the design in terms of the userÕs objectives. • Perform the transform if it satisfies the above. The selection of transforms and design data can be made according to pre-defined heuristics or randomly, using simulated annealing (SA). The advantage of SA is that objectives can be traded off against each other without any detailed knowledge of how the objectives might interact. All that is needed is a cost function of the form: CTOTAL ¼ w1 C1 þ w2 C2 where C1 and C2 are the costs of the design in terms of two objectives, say, area and delay and w1 and w2 are the weights of the objectives as defined by the userÕs priorities. The cost of each transform can be calculated from
If the test structures are inserted at the end of behavioural synthesis (in other words only for the RTL structure represented by the end of the design trajectory), there are two separate optimisation processes. From analysis of the different synthesis for testability systems, we have come to the conclusion that testability improvement will be constrained by the quality of the design. Once the RTL design has been fixed, there is limited flexibility in choosing registers for BIST modification and test scheduling is similarly constrained. Also, BIST insertion after the design has been completed may offset the optimisation gains achieved during the high-level design process [10]. After functional synthesis is complete, approaches have been proposed for BIST insertion and optimisation that tend either towards the minimal (extra) area overhead solution or the minimal test (application) time solution [12]. Hence, we need an approach which can tackle both the overheads in an integrated way during behavioural synthesis. This should be done in such a way that while considering the test time, an estimation of the area overhead should be considered and vice versa. This is highlighted as two extremes of testable design space in Fig. 4. The basic approach is to maximise the sharing of test registers resulting in a fewer number of registers being modified for BIST [10,13]. However, this may also increase the number of test sessions. The goal is to reduce the BIST area overhead without sacrificing the quality of test. Another approach is to maximise the test concurrency resulting in fewer test sessions [5]. We propose adding testability as an additional objective in design space. The unified expanded search space is depicted in Fig. 4. The global quality of a design with BIST may suffer unless testability is analysed and improved before the structure of the design is finalised (i.e. unless testability is improved either before or during synthesis, the design space is not fully explored and no trade-offs between area/performance and testability are possible). Since we want testability to be taken into account when design choices are taken, we will analyse and improve testability during design space exploration.
M. Zwolinski, M.S. Gaur / Microelectronics Reliability 43 (2003) 685–693
689
Area overhead
Intermediate design’s search space for self_testability Minimal test time design
Test design space
Representative design space
Minimal area overhead design t min
Test time
t
max
Area
initial
(Maximally serial) Initial design configuration Functional design space
final
Area
Trajectory of design during behavioural synthesis
Area
Final frozen configuration
Delay final
Delay
Delay
initial
Fig. 4. Unified design space for behavioural synthesis.
3. BIST insertion A
PRPG
PRPG
B
3.1. BIST methodology Classically, the testability insertion phase comes after functional logic synthesis in the VLSI design cycle. A number of BIST techniques exist. Here we will only discuss one method [14]. A valid data path is made up of functional units (FU). A self-testable FU can be created by reconfiguring one existing register as a test register at every input and output port of the FU as shown in Fig. 5. For the purposes of this example, every FU in a data path is assumed to have two input ports and one output port and is random pattern testable. The test registers may be PRPG, MISR or BILBO. A FU can have a number of potential self-testable configurations. We call each of these configurations a
FU
Normal
C
MISR
Normal
Fig. 5. Self-testable functional unit configuration.
BIST Embedding (BE). Each BE defined for a FU is a three-tuple fRin1 ; Rin2 ; Rout g describing the potential test
690
M. Zwolinski, M.S. Gaur / Microelectronics Reliability 43 (2003) 685–693
R1
{R1,R2,R6}
{R1,R5,R6}
R2
FU1 +
FU2 +
R3
R5
R4
FU3 +
FU4 +
{R2,R5,R6}
{R3,R4,R7}
{R3,R5,R7}
of BEs and we can calculate the ‘‘testability cost’’ of each set in terms of the additional test structure area and the test time: CTEST ¼ wTA CTA þ wTT CTT
3.2. BIST register allocation
{R4,R5,R7} R6
R7
Fig. 6. BIST embedding of FUs in data path.
register set for self-test. Rin1 ; Rin2 are potential input PRPGs connected to input ports 1 and 2. Rout is a potential MISR connected to the output port of the FU. An illustration of BEs in a data path is shown in Fig. 6. A register cannot be both a PRPG and an MISR in the same test session. Nor can one MISR be used to test two or more FUs. As Fig. 6 shows, each FU can have more than one BE. In other words, the set of BEs chosen for a test session must be compatible. If both PRPG and MISR functions are required, the register can be constructed as a BILBO with both modes. Self-loops are a special case. If they are unavoidable, then a register needs to be configurable as a CBILBO, but this comes at a very high price. Hence, test insertion can be measured with three metrics: fault cover; the additional area of test structures and the test time (or number of test sessions). If we assume that every register can potentially be made reconfigurable, and assume that every PRPG can run through an exhaustive set of test vectors (which admittedly may be unrealistic for large bit widths), the fault cover can be assumed to be maximised and therefore not considered further. Chaining FU within one clock cycle presents a further difficulty. We have to consider compatabilities between PRPGs. Fig. 6 shows a data path of FU and registers and their BEs. It can be observed that if we take the test compatibility of the FUs in the data path only on the basis of MISR sharing then it is possible that some of the indirect connections requiring I-paths [4] may generate incompatibilities. In Fig. 6, FU3 and FU2 are compatible as they do not share an MISR. It can be observed here that in the BE of FU3, register R5 is required to be a constant, 0 in this case, to make FU4 transparent to transport the test vectors to MISR R7. To test FU2, however, register R5 needs to be configured as a PRPG. R5 cannot generate a constant 0 and random patterns simultaneously, hence FU2 and FU3 are incompatible. Thus this condition is also need to be considered while creating the BIST cost vector. We call this I-path induced incompatibility. Therefore for a given RTL design––or for one point on the design trajectory––we can find all the BEs for each FU. We can determine one or more compatible sets
The BIST resource minimisation problem is to decide which registers to modify and the modes––PRPG, MISR, or BILBO––to add, in such a way that every FU in the given data path can be tested, as shown in Fig. 6, with minimum overhead incurred by addition/modification of registers. BIST register allocation, leading to an estimate of the test register area, consists of the following steps: 1. Determine the connectivity and the predecessor–successor relationship of the FUs of the data path at every iteration. • Insert additional registers to remove self-adjacency. • The list of potential input test registers (i.e. test registers at the input, j, of a FU, fi ), Lin i;j , is built on the assumption that, provided an I-path exists, all the input test registers of the ancestor FU are in potential input test registers. (Rt 2 Lin i;j , if Rt 2 Lk;l , fk is an ancestor of fi and an I-path exists from input port l of k to input port j of i.) • Similarly, the list of potential output test registers (i.e. test registers at output of a FU, fi ), Lout i , is built on the assumption that, provided an I-path exists, all the output test registers of the descendant FU are potential output test registers. (Rt 2 out Lout i , if Rt 2 Lk , fk is an descendant of fi and an I-path exists from i to k.) 2. Assign a weight to each potential test register calculated as the sum of the sequential depths of the input paths. 3. Allocate registers for each FU with the following constraints: • A self-adjacent register cannot be a test register, as the cost of CBILBO is more than the cost of additional register insertion for testing. • Two FU sharing a register at input cannot have the same output register. This takes care of the MISR compatibility requirement. • A FU cannot have same test register at all its inputs to avoid the detrimental effect on the fault coverage because of correlation. 4. Check the I-path induced incompatibility, as defined in Section 3.1, among all FUs of data path. The test register allocation leads to a BIST area overhead component CTA . Consider a data path com-
M. Zwolinski, M.S. Gaur / Microelectronics Reliability 43 (2003) 685–693
prising m FUs fi (i ¼ 1; 2; . . . ; m) and r registers Rt (t ¼ 1; 2; . . . ; r). We associate with each register Rt , three binary variables, Rtp , Rtm and Rtb one of which may be set to indicate that Rt needs to be modified to a PRPG, MISR, or BILBO, respectively. For optimisation, these requirements need to be stated as constraints. This last condition, that a register may be modified to no more than one type of test register may be stated as Rtp þ Rtm þ Rtb 6 1: The need to cover each input port, j ¼ 1; 2; . . . k, of fi , using a test register from set Lin i;j defines the constraint: X in ðRtp þ Rtb Þ P 1; 8Rt 2 Lin i;j ; Rt 62 \k Li;k ; t
and to cover the output port using a test register from the set Lout defines the constraint: i X ðRtm þ Rtb Þ P 1; 8Rt 2 Lout i : t
A weight factor is applied for selection of the most suitable modifications. We associate a modification cost with each register, Ctp , Ctm , Ctb (costs of a PRPG, MISR and BILBO respectively). The total area of the test registers set, RT , representing the BIST area overhead CTA cost is: rp rb rm X X X ðCtp Rtp Þ þ ðCtm Rtm Þ þ ðCtb Rtb Þ CTA ¼ t¼1
t¼1
t¼1
8r 2 RT ¼ rp \ rm \ rb where rp , rm , and rb are the sets of PRPG, MISR, and BILBO respectively. 3.3. BIST scheduling CTT is the BIST test time component estimate. The BIST test time cost component is calculated at every iteration from test schedule estimates. The test time estimate is calculated by application of a sub-optimal test scheduling algorithm on the selected testable FUs. Each testable FU represent a FU with two input controllable registers and one output observable register as illustrated in Fig. 6. This is performed after the selection and estimation for the BIST area overhead described in the previous subsection. Algorithm 1 shows the estimation of the test schedule and time. Algorithm 1. Test schedule estimation Algorithm Transform-based suboptimal test schedule estimation Require: Minimum cost test register mapping for the datapath.
691
Fi ¼ ith functional unit in sorted list Tmax ¼ maximum test time per test session Vi ¼ compatibility vector of Fi Ti ¼ estimated test length for Fi sessioni ¼ list of functional units in ith session n ¼ number of test sessions Sort the FUs by their estimated test length, n ¼ 1 for i ¼ 1 to number of functional units do if Fi not visited then sessionn ¼ Fi; timen ¼ Ti ; S ¼ Vi ¼ set of functional units compatible with Fi ; while S not empty do k ¼ next element in S; if (Tk þ time < Tmax ) then mark Fk visited, timen ¼ timen þ Tk , S ¼ S \ Vk , sessionn ¼ sessionn [ fFk g; endif S ¼ S fFk g; endwhile endif n ¼ n þ 1; endfor 3.4. Unified optimisation method SA is used for multiple objective optimisation in the unified design space. SA permits hill-climbing to avoid getting stuck at local optima. The main advantages of SA are that it is abstract with respect to the particular problem and it has the ability to find a global minimum. No knowledge of the trade-off mechanism is required, instead the process relies entirely on the cost function and transform estimators to explore the design space. Algorithm 2 describes the unified algorithm. SA starts with an initial solution and a minor modification of the initial solution creates a neighbouring solution. The initial solution is a naive data path and controller implementation with one state per register transfer and one FU per operation. As no attempt is made at this point to share any resources, the design is maximally seria1. At present, this methodology has been integrated with MOODS, but it is generic enough to be integrated in any incremental design space exploration system. Algorithm 2. Unified optimisation algorithm with BIST Algorithm Generate a structural BISTed design from a behavioural description Require: User priorities for area/delay parameters and test time/test area. For temperature start to end with a defined step do
692
M. Zwolinski, M.S. Gaur / Microelectronics Reliability 43 (2003) 685–693
Table 1 Results of unified test synthesis algorithm, normalised to the initial implementation Without BE Viterbi
With BE Dhrc
Viterbi
Dhrc
(a) Optimisation priority Area high delay high Optimised area 69.51 Optimised delay 43.37 BIST area overhead 35.58
72.57 36.51 17.48
63.47 45.02 28.20
69.18 36.51 15.48
(b) Optimisation priority Area high delay low Optimised area 67.21 Optimised delay 68.48 BIST area overhead 35.78
74.28 53.23 16.31
65.85 58.01 27.64
70.62 53.65 14.11
(c) Optimisation priority Area low delay high Optimised area 68.89 Optimised delay 60.92 BIST area overhead 36.14
68.81 45.83 15.96
63.95 55.79 32.39
68.35 45.08 13.97
For i ¼ 0 to No_iterations do t ¼ select transformationðÞ; If test_transformation(t) then Present cost ¼ Calculate BISTed costðtÞ Perform_transformation(t) Post cost ¼ Calculate BISTed costðtÞ DE ¼ ðpost cost present costÞ if (DE < 0 OR RandðÞ < expðDE=tempÞÞ then accept_transformation(t) else reject_transformation(t) endif endif endfor endfor To investigate the effect of a large number of transforms, we introduced a skip count in the SA algorithm. This is the number of transforms which are skipped for calculation and integration of the BE vector component. A large value of skip count reduces the size of the design space explored for testability.
4. Results We analysed two designs: a viterbi decoder and a dhrc using the proposed methodology. These two designs have 215 and 135 lines of behavioural VHDL and synthesise to about 360 and 260 cell library elements, respectively. Three different targets for area and delay were used––(a) the area and delay were given equal priority; (b), area was given a higher priority than delay; and (c), delay was given a higher priority than area.
Table 1 summarises the results for the experiments. A uniform optimisation schedule for our proposed method was employed. To reference the differently synthesised designs at different priorities, the initial design costs are normalised to a value of 100. The BIST area overhead is shown as a percentage of the area of the final optimised designs. In all cases, the overall area is less if BIST is included in the design space exploration, than if BIST is added at the end of the design cycle. The BIST area overhead is also reduced. In most cases, the delay is also reduced, although in two cases the delay is slightly increased.
5. Conclusions The results demonstrate that improved self-testable designs can be generated if the testability is integrated into the design space. This can be explained by the fact that a priori BIST consideration within the optimisation loop makes it possible to select, share, and place test resources with lower overheads. This is achieved by an efficient exploration of available design space and actively trading-off testability. This is further manifested by the positive effects of skip count on BIST overhead. In future we plan to investigate the controller BIST implementation in a similar unified way.
References [1] Berthelot D, Flottes ML, Rouzeyre B. A method for trading off test, area and fault coverage in datapath BIST synthesis. J Electronic Testing: Theory Appl 2001;17:331– 9.
M. Zwolinski, M.S. Gaur / Microelectronics Reliability 43 (2003) 685–693 [2] Dey S, Raghunathan A, Wagner KD. Design for testability techniques at the behavioral and register-transfer levels. J Electronic Testing: Theory Appl 1998;13:79–91. [3] Harmanani H, Papachristou C. An improved method for RTL synthesis with testability tradeoffs. In: Proc International Conference on Computer-Aided Design, 1993. p. 30–7. [4] Harmanani H, Saliba R, Khouray M. A genetic algorithm for testable data path synthesis. In: Proc IEEE Canadian Conference on Electrical and Computer Engineering, CCECE, 2001. p. 1073–8. [5] Harris IG, Orailoglu A. SYNCBIST: synthesis for concurrent built-in self-testability. In: Proc European Design and Test Conference, 1994. p. 101–4. [6] Lin SP, Njinda CA, Breuer MA. Generating a family of testable designs using the BILBO methodology. J Electronic Testing: Theory Appl 1993;4:71–89. [7] Mohamed AR, Peng Z, Eles P. BIST synthesis: an approach to resource optimization under test time constraints. In: Proc IEEE International Test Synthesis Workshop, Santa Barbara, USA, March 2001. [8] Nicolici N, Hashimi BM, Brown AD, Williams AC. BIST hardware synthesis for RTL data paths based on test
[9] [10]
[11]
[12]
[13]
[14]
693
compatibility classes. IEEE Trans Comput Aided Des 2000;19:1375–85. Olcoz K, Tirado F, Mecha H. Unified data path allocation and BIST intrusion. Integration, VLSI J 1999;28:55–99. Parulkar I, Gupta SK, Beuer MA. Introducing redundant computation in RTL data paths for reducing BIST resources. ACM Trans Des Autom Electron Syst 2001;6: 423–45. Williams AC, Brown AD, Zwolinski M. Simultaneous optimisation of dynamic power, area and delay in behavioural synthesis. IEEE Proc Comput Digital Tech 2000; 147:383–90. Li X, Masuzawa T, Fujiwara H. Strong self-testability for data paths high-level synthesis. In: Proceedings of the Asian Test Symposium, Taipei, Taiwan, December 2000. p. 229–34. Ravi S, Jha NK, Lakshminarayana G. TAO-BIST: a framework for testability analysis and optimisation RTL circuits for BIST. In: Proc IEEE VLSI Test Symposium, 1999. p. 398–406. Bushnell ML, Agrawal VD. Essentials of electronic testing for digital, memory and mixed-signal VLSI circuits. Boston, USA: Kluwer Academic Press; 2000.