Microelectronics Journal 35 (2004) 443–449 www.elsevier.com/locate/mejo
Low-cost, on-line self-testing of processor cores based on embedded software routinesq Dimitris Gizopoulos* Department of Informatics, University of Piraeus, Piraeus, Greece Received 1 August 2003; revised 28 November 2003; accepted 29 December 2003
Abstract On-line testing for complex system-on-chip architectures requires a synergy of concurrent and non-concurrent fault detection mechanisms. While concurrent fault detection is mainly achieved by hardware or software redundancy, like duplication, non-concurrent fault detection, particularly useful for periodic testing, is usually achieved through hardware-based self-test. Software-based self-test has been recently proposed as an effective alternative to hardware-based self-test allowing at-speed testing while eliminating area, performance and power consumption overheads. In this paper, we investigate the applicability of software-based self-test to non-concurrent on-line testing of embedded processor cores and define, for the first time, the corresponding requirements. Low-cost, in-field testing requirements, particularly small test execution time and low power consumption guide the development of self-test routines. We show how self-test programs with a limited number of memory references and based on compact test routines provide an efficient low-cost on-line test strategy for an RISC processor core. q 2004 Elsevier Ltd. All rights reserved. Keywords: On-line testing; Processor testing; Self-testing; Software-based self-testing; Low-cost testing
1. Introduction Manufacturing technologies with feature sizes in the deep submicron area are already a common practice. This fact along with the emergence of the system-on-chip (SoC) design paradigm has resulted in single chip electronic components with sophisticated functionality and unique performance. This increased functionality and performance does not come by itself. The problem of testing deeply embedded processors and the surrounding cores in a complex SoC is becoming more and more difficult at any level of the system life cycle. Manufacturing testing aims at detecting faults during the fabrication process before the SoC is delivered to market. High manufacturing test quality for embedded cores requires at-speed testing to detect defects that manifest themselves only in the actual speed of operation of the system. Hardware self-test techniques like scan-based q A preliminary version of this paper was presented at the IEEE International On-line Testing Symposium (IOLTS) 2003. * Tel.: þ30-210-414-2372; fax: þ30-210-414-2264. E-mail address:
[email protected] (D. Gizopoulos).
0026-2692/$ - see front matter q 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.mejo.2003.12.005
built-in self-test (BIST) provide excellent test quality as they achieve the at-speed testing goal. However, high-test quality comes along with large hardware overhead, complex timing issues to be resolved and increased power consumption during testing. As processor technology strives for high performance, low area and low power consumption it is apparent that hardware BIST cannot co-exist with these high-end processor features. Software-based self-test techniques for embedded processors have been proposed in Refs. [1 –5] as effective alternatives to hardware self-test. These techniques are nonintrusive in nature as they use the processor instruction set to perform self-testing. The key concept of software-based self-test is the generation of efficient self-test routines that lead to high structural fault coverage. The processor executes the self-test software downloaded from a lowspeed, low-cost external ATE at its actual speed (at-speed testing) and no area, performance or power consumption penalties are induced. Therefore, software-based self-test is a very important low-cost test solution, i.e. it does not add hardware overhead, it does not have any impact on performance and/or timing and it does not consume additional power during execution.
444
D. Gizopoulos / Microelectronics Journal 35 (2004) 443–449
After the chip’s manufacturing correctness is verified by manufacturing testing, it is placed in its actual environment of operation where many different types of faults may appear. Cosmic rays, alpha particles, electromagnetic interference, and power glitches are some of the main reasons for operational faults appearance. Operational faults are usually classified into permanent faults that exist indefinitely, intermittent faults that appear at regular time intervals and transient faults that appear irregularly and last for short time [6]. On-line testing aims at detecting and/or correcting these operational faults by means of concurrent and non-concurrent test techniques. Non-concurrent on-line test strategies are particularly useful for periodic testing, which assures system reliability. These techniques are based on hardware BIST and are capable of detecting permanent faults and intermittent faults with fairly large duration (when test is applied periodically). Non-concurrent testing is not capable to guarantee the detection of transient and intermittent faults of small duration time. Concurrent on-line test strategies are used to detect operational faults within small time frame—low error detection latency while keeping the system in normal operation. These strategies utilize hardware redundancy techniques like duplication with compare, watchdog, and self-checking design [6,7]. However, when large increase in silicon area is not acceptable, time (or software) redundancy techniques provide an alternative for on-line testing. These techniques achieve the software implemented hardware fault tolerance (SIHFT) by duplicating program statements or by executing programs repeatedly [8] or by implementing signature monitoring. In this paper, the characteristics of software-based selftest, i.e. self-testing with the execution of embedded machine code routines, that make it appropriate in the context of on-line testing during system operation, are identified. In particular, we show how software-based selftest can provide an effective, low-cost, non-concurrent online test solution for embedded systems that do not require immediate detection of errors (such as detection latency of a single clock cycle) and cannot afford, in terms of hardware and performance overhead, the well-known hardware redundancy or time redundancy mechanisms. If immediate error detection is mandatory, then software-based self-test can be transparently combined with a concurrent test scheme to provide a comprehensive on-line test strategy. This paper is organized as follows. Section 2 presents the on-line testing framework and elaborates on guidelines for embedded software development: (a) small, loop-based test code, (b) quick test code with minimum memory references. Section 3 presents a summary of software-based self-test techniques. Suitability of the software-based self-test techniques for on-line testing is explored in Section 4. Section 5 presents overall results for the on-line test routines developed for major components of the Plasma processor. Finally, Section 6 concludes the paper.
2. Low-cost, on-line testing requirements On-line testing of embedded systems is performed while the processor operates in its normal operational environment where external memory, SDRAM, ROM or flash memory and other I/O-peripherals may be present. The processor operates under the control of an operating system as shown in Fig. 1. Test program execution may be initiated during system startup or shutdown thus ensuring correct operation in subsequent startups. Alternatively, the operating system scheduler may identify idle cycles and issue test program execution or test program may be executed at regular time intervals with the aid of programmable timers found in the system. When running test software at system start-up/shutdown or at idle intervals, only permanent faults (existing or others that became permanent) or nonpermanent faults (intermittent or transient) that accidentally exist during test execution will be detected. On the other hand, when running test software at regular intervals, intermittent faults will also be detected (provided that they have enough duration). In both cases, transient faults are accidentally detected. It is apparent that test software is an additional process that competes with user processes for system resources, CPU cycles and memory. Therefore, on-line test programs execution is considered to be an overhead to overall system performance regarding memory area, power consumption and execution cycles. Minimal impact of on-line embedded test routines to the system resources is the key for achieving
Fig. 1. On-line software self-testing concept.
D. Gizopoulos / Microelectronics Journal 35 (2004) 443–449
a low cost on-line test strategy for processors and processorbased embedded systems. An effective low-cost software-based on-line self-test methodology should aim at the following goals: † small memory footprint; † small execution time; † low power consumption. The reasons why the above goals are crucial for a lowcost on-line self-test methodology is that such a methodology should: † reserve as few as possible system resources (memory words and CPU clock cycles) for on-line test execution— this can be guaranteed if self-test routines as fast and small in size of the program and data; † detect the targeted faults with as low as possible fault detection latency—this can be guaranteed again only if the self-test routines are fast and small and involve as few as possible memory references (in data memory and instruction memory). In the following subsections we analyze these goals and elaborate on their contribution to low-cost on-line testing. 2.1. Small memory footprint A serious problem of large on-line self-test routines is that they require large parts of the system memory to store their instructions and data in. Any kind of memory like ROM, RAM or flash memory can be used for this purpose, but reduced memory sizes of low-cost embedded applications set a tight limit on memory usage for testing purposes. Embedded systems usually have very small amounts of memory carefully allocated to operating system and the user programs. Test programs with big memory footprint usually take more time to run (larger code/data translates to increased cache miss rate and thus increased number of memory stall cycles). Additionally, a test program with large memory footprint will force user data to be unloaded from cache memory. These data have to be moved to system’s external memory. When user program resumes, it will experience cache misses which will significantly affect the overall system performance. Finally, large on-line test routines with many memory references lead to increased power consumption during selftest execution. It has been calculated that a serious part of the system power consumption is located in the memory subsystem and thus reduction of memory references is always reducing consumed power during test. This is particularly important in battery-operated systems where self-test routines are to be executed periodically.
445
2.2. Small execution time On-line self-test software routines are regarded as an overhead to user processes execution time. To tone down this overhead, test software should run in the minimum possible number of CPU clock cycles. An ideal case is when test software is able to execute in a quantum time cycle assuming an operating system with round robin CPU scheduling strategy. Typical values of quantum times used in embedded application are in the range of a few hundreds of milliseconds. Although it is possible to have test software execution broken into several quanta, this will lead to further overhead in system operation due to larger context switch overheads. The major problem is that large execution time of self-test routines or spanning them over more than one quantum time cycles lengthens fault detection latency of both permanent faults and temporary faults existing during self-test program execution. To guarantee minimum performance overhead, on-line self-test software should: † consist of compact, loop-based routines of limited iterations; † minimize interactions with the memory system. 2.3. Low power consumption A very common application for on-line testing is mobile applications where power consumption is of great importance. A study by Intel [10] has shown that 33% of a notebook system power is consumed in the CPU with a twolevel cache hierarchy system. In the CPU, about 20 –30% of power is consumed in the cache system and about 30% is consumed in clock circuitry. Considering the data transfers from external memory in case of a cache miss, the power consumed in the overall memory system increases furthermore. The processor has to stall when a cache miss occurs. Extra energy has to be consumed in driving clock-trees and pulling up and down the external bus between the on-chip cache and external memory. On-line test software that takes advantage of temporal and spatial locality reduces the cache miss overhead. A test program that is loop based takes advantage of temporal locality while a test program that references a small amount of data takes advantage of spatial locality. Test programs should be constructed in a way that takes advantage of these features.
3. Software-based self-test methodologies Software-based self-test methodologies for embedded processors in SoCs have been presented as an attractive alternative to classical hardware-based self-test. They alleviate the problems caused by DFT since they move the test process to a higher level of abstraction. Therefore, area,
446
D. Gizopoulos / Microelectronics Journal 35 (2004) 443–449
performance and power consumption overheads are eliminated. Different approaches to software-based self-test have been presented in the literature. Functional software-based self-test methodologies [1] use the processor’s instruction set without requiring prior knowledge of the processor structure. They apply random instruction sequences combined with random test data in order to exercise a large portion of processor’s functionality resulting in self-test software routines that are characterized by high memory and execution time requirements. Structural software-based self-test strategies like Ref. [4] are characterized by a component oriented test development approach, fine-tuned to the low, gate level details of the processor core. Pseudorandom patterns or patterns generated by an automatic test pattern generator (ATPG) are produced for each of the processor components in an iterative method which takes into consideration the constraints imposed by the processor instruction set. For the targeted components (mainly combinational components) where constrained test generation is possible, the generated tests achieve high fault coverage, and they are delivered to the component using processor instructions. A methodology that is neither functional nor low-level structural has been proposed in Ref. [5]. Test development is applied on high register-transfer (RT) level while targeting structural faults. Components are classified according to their testability properties in functional, control or hidden classes. This classification leads to selftest programs achieving high fault coverage with very small cost both in terms of the engineering effort required and also in terms of the size and execution time of the generated selftest programs. In this paper, we focus on the feasibility and suitability of different software-based self-test methodologies for on-line testing of embedded systems. We compare four different self-test code development styles, namely ATPG based using immediate instructions, ATPG based using memory loads, pseudorandom based like in Refs. [2 – 4] and deterministic based like in Ref. [5]. The suitability of each of these methods for on-line testing is analyzed based on the appropriate criteria and requirements for low-cost, on-line testing.
4. On-line test challenges for software-based self-test: alternatives analysis To demonstrate how software-based self-test is suitable for low-cost, on-line testing we have developed test routines for a parallel multiplier, a very critical component of the processor in terms of speed and circuit area, a component that can be found in all modern processors. We present the self-test routines in a generic fashion, which can be applied to other functional components of the processor. The MIPS instruction set has been used in order to demonstrate samples of the self-test routines and a model of the MIPS I
processor named Plasma [11] has been used to provide the experimental fault simulation results. Test program execution time can be generally described by the following equation [9]: CPU-execution-time ¼ clock-cycle-time £ ðCPU-clock-cycles þ pipeline-stall-cycles þ memory-stall-cyclesÞ It is apparent from the formula that the existence of pipeline stalls has a negative impact on test execution speed. Although they do not affect power consumption, they should be avoided by constructing test programs, which do not cause unresolved data hazards. Control hazards can be avoided in architectures that implement the branch delay slot resolution, like MIPS, by proper instruction placement in the delay slot. However, stall cycles are unavoidable when a prediction unit is used to handle branch conditions. Moreover, the interaction of the test program with the instruction and data memories has a very serious impact on the test program execution (which in turn leads to longer detection latency, a crucial factor for on-line testing). Minimization of memory interaction (both for instructions fetching and for data reads and writes) leads to self-test programs which are more appropriate for low-cost, on-line testing, i.e. have smaller memory requirements, smaller test execution times and smaller detection latencies. In the following subsections we see how the above formula for program execution time evaluates in four different selftest program development styles and we compare them with respect to their suitability for on-line testing. 4.1. ATPG-based self-test routines An ATPG tool can be used to generate N test patterns for the processor component under test (the parallel multiplier in our example) taking into consideration the constraints imposed by the instruction set. In the Plasma/MIPS model we use in our analysis and evaluation, the ATPG-generated test patterns map to two processor instructions for the multiplication operation. These instructions are mul and mulu using the MIPS terminology for signed and unsigned multiplication, respectively. The ATPG-generated test patterns can be transformed to test routines in two ways. Either using the processor instruction set with immediate addressing mode to generate and apply the patterns or loading patterns from memory and then applying them. A sample of a routine that uses the immediate instruction (load immediate (li) instruction) format to generate and apply patterns is illustrated in Fig. 2. Test patterns are loaded in registers using the li pseudoinstruction, which the assembler decomposes to instructions lui and ori. After test patterns are applied, test responses
D. Gizopoulos / Microelectronics Journal 35 (2004) 443–449
Fig. 2. Generating ATPG test patterns using the immediate instruction format.
447
the component under test (multiplier) and afterwards test responses are compacted. The load word (lw) instruction requires two clock cycles in order to fetch data from the SRAM. Therefore, no instruction dependent on the loaded register should be placed after the lw instruction. This would cause a pipeline stall making test routine run for more cycles. This is an example of careful self-test program development focusing on on-line testing. Although the test routine fetches test patterns from memory system and applies them using a loop-based approach thus minimizing instruction cache miss rate, it fails to reduce data cache miss rate. Particularly, in the case that the number of test patterns generated from the ATPG is large the data cache miss rate will dramatically affect test program execution time. Therefore, in the most usual cases (where a significant number of ATPG patterns are necessary for testing the targeted component), this approach is not suitable for low-cost, on-line testing. 4.2. Pseudorandom-LFSR-based self-test routines
Fig. 3. Loading ATPG test patterns from memory.
are compacted by the compaction code. Self-test routines developed using the immediate instruction format avoid transferring data from memory. Thus, no data cache misses will occur. On the other hand, instruction cache misses will occur frequently as instructions are not re-used in a loopbased scheme. Alternatively, the ATPG generated test patterns can be fetched from the memory system as shown in Fig. 3. In this scheme, the test patterns for testing the signed multiplication are put in the processor’s memory starting at address mul_patterns. These patterns are fetched, applied to
A pseudorandom test pattern generation approach can be used to test the parallel multiplier. A routine that applies pseudorandom test pattern to the multiplier is demonstrated in Fig. 4. Obviously, this self-test routine takes advantage of temporal locality, since it is loop based. No data cache misses will occur and the number of interactions with the memory system is limited to the instruction references only. Pseudorandom-based self-test routines seem to suit well to on-line software self-test requirements when a randomtestable processor component is considered, i.e. when a high fault coverage can be obtained with a reasonably small number of random test patterns applied without many reseedings of the software pseudorandom pattern generator. Unfortunately, many of the processor components are random-pattern resistant which means that a large number of pseudorandom test patterns must be applied to reach acceptable test quality. This fact leads to test programs with excessive test execution time which has a very serious
Fig. 4. Pseudorandom test pattern generation.
448
D. Gizopoulos / Microelectronics Journal 35 (2004) 443–449 Table 1 Test routine characteristics
Fig. 5. Deterministic test pattern generation.
impact on the fault detection latency of the applied on-line testing methodology. 4.3. Deterministic self-test routines The parallel multiplier is a very critical-to-test component of the group of functional components that possess an inherent regularity (arithmetic components, logic arrays, register files) and thus a high RT level, deterministic test development approach [5] can be applied. Software-based self-test routines generated according to Ref. [5] lead to compact code where deterministic (non-ATPG based) test patterns are generated, applied and compacted in a loopbased manner without data memory interaction. Sample of a deterministic self-test routine is presented in Fig. 5. Although, usually, a few more test patterns are necessary when compared to the ATPG approach the code above meets better the requirements of on-line testing as it keeps instruction and data cache miss rate at very low levels. Deterministic self-test is also applied to any functional components with inherent regularity. Such regular modules occupy large silicon areas in the processor and thus very high fault coverage can be obtained with small test sets applied by small and fast test routines. Therefore, both lowcost on-line test requirements for small memory footprint, and small detection latency are completely satisfied. 4.4. Suitability for on-line testing Evaluating the presented self-test routines for on-line testing is not a trivial task. A test routine that executes for many cycles, like the LFSR routine, might seem inappropriate for on-line testing. However, if we consider an SoC built around a high-speed processor with very small memory or a slow memory like SDRAM, it is preferable to let CPU execute for more cycles using a deterministic or pseudorandom-based approach. On the other hand, if a lowspeed processor is used with a high-speed memory like a SRAM it might be preferable to use an ATPG-based
Test routine
Instruction cache miss rate
Data cache miss rate
Number of test patterns
ATPGIMM ATPGLOOP LFSR Deterministic
High Low Low Low
Low High Low Low
Low Low High Low
approach to avoid excessive CPU cycles and have test patterns coming from the fast memory provided that the number of patterns is very low. Table 1 presents test routine characteristics taking into consideration instruction cache miss rate, data cache miss rate and number of test patterns. The ATPG approach leads either to high instruction cache miss rate or to high data cache miss rate. Test routine construction is affected by the irregularity of the ATPG generated test patterns. The pseudorandom approach keeps cache miss rate at low level, but usually requires a large number of test patterns to guarantee high-test quality. Therefore, overall test program execution is lengthened. On the other hand, the deterministic approach achieves high-test quality, wherever it can be applied, by generating a small number of non-ATPG test patterns without exercising the memory system too much. As a result, low-cost on-line test goals are achieved at a higher degree.
5. Experimental results A 32-bit MIPS like core named Plasma with three-level pipeline is used as a vehicle to demonstrate test routines characteristics. It supports interrupts and all MIPS-I user mode instructions, except unaligned load and store (which are patented) [11]. The Plasma core was enhanced with a parallel multiplier with the following characteristics: Booth recoding, Wallace trees for partial products summation and fast carry lookahead addition at the final stage [12]. Mentor Graphics suite was used for VHDL simulation (Modelsim) and generation of a test set of 168 test patterns (FlexTest) taking into consideration the constraints imposed by the instruction set. Two software LFSR programs were developed to generate 400 and 1200 test patterns for the multiplier, respectively. We developed self-test routines for the parallel multiplier according to the approaches presented earlier. Test responses are compacted in all cases using a software MISR routine and a final signature is stored in memory for further analysis. Table 2 presents statistics for the test programs developed for the parallel multiplier. Column 2 presents the memory requirements of each test program in four-byte words. Column 3 presents the data memory references (loads and stores) the test programs are expected to make. Columns 4 and 5 present the clock cycles
D. Gizopoulos / Microelectronics Journal 35 (2004) 443–449 Table 2 Multiplier test program statistics
Table 3 Self-test program statistics for the entire Plasma RISC processor
Test program
Size Data references Clock cycles Test coverage (words) (%)
Statistic test program size (words) Clock cycles Test coverage
TPATPG_LOOP TPATPG_IMM TPATPG_IMM_JAL TPLFSR (400) TPLFSR(1200) TPDETERMINISTIC
397 2539 1207 69 69 57
337 1 1 1 1 1
4240 2890 3562 9837 29,437 4050
99.5 99.5 99.5 97.6 97.9 99.0
449
930 13,077 94.6%
time lead to low power consumption without compromising high-test quality thus low-cost on-line objectives are met.
6. Conclusions and the test coverage for stuck-at faults that is achieved by each test strategy. The ATPG-based test programs cause significant problems when placed in the on-line test environment because of their quite large memory footprint. The ATPGLOOP test program requires 336 data memory references in order to load 168 patterns as each pattern is 64-bits while the lw instruction loads 32-bit amount of data and 1 memory reference in order to store the signature. While the ATPGIMM test program achieves the smallest number of execution cycles, its huge memory footprint makes it prohibitive for the on-line test environment. Optimizing the ATPGIMM by removing the compaction code after each multiplication and jumping to the compaction routine (ATPGIMM_JAL) reduces the memory footprint by almost 47%. However, the memory footprint of the ATPGIMM_JAL routine remains large and fails to meet the on-line test requirements. We should note that the ATPG-based routines are favored as they reside in SRAM. Applying the same test routines on a multi-level cache system without SRAM will degrade their performance as explained earlier in Section 4.1. On the other hand, the LFSR and deterministic-based routines suit much better to in-field test environment. Both LFSR and deterministic routines have a very small memory footprint and thus do not affect user space. Achieving very high-test quality with pseudo-patterns is a real challenge as already mentioned in Section 4.2. Table 2 shows that the pseudorandom-based routine has a great impact on test execution time. Increased test execution time means that fault detection latency is also increased. On-line test requirements are met best by the deterministic-based routines. Not only do they have a small memory footprint but they also achieve high-test quality while keeping test application time relatively low. The deterministic-based routines can also be applied to a system without on-chip SRAM. Because of their loop-based nature as shown in Section 4.3 it is expected that after a small number of instruction cache miss will fully reside in cache. We have applied the deterministic test development strategy to the Plasma CPU core. Self-test program statistics (memory requirements and CPU clock cycles), along with achieved test coverage are presented in Table 3. The attractive features of deterministic-based routines namely small memory footprint and small test application
We have shown that recently proposed software-based self-test of processor cores in SoCs can be a very effective, low-cost strategy for on-line testing. Faults can be detected with relatively small-embedded test code that executes for small time intervals. Software-based self-test for on-line testing can be applied to improve reliability of low-cost embedded systems where hardware redundancy and software-redundancy cannot be applied due to their excessive cost in terms of silicon area and execution time, respectively. The experimental results presented for a RISC processor cores of a classical architecture used in embedded systems show that a deterministic-based self-test routine development strategy leads to high-quality routines of small size, small execution time and low power consumption.
References [1] J. Shen, J. Abraham, Native mode functional test generation for processors with applications to self-test and design validation, Proceedings of IEEE ITC 1998, pp. 990 –999. [2] F. Corno, G. Cumani, M. Sonza Reorda, G. Squillero, Fully automatic test program generation for microprocessor cores, IEEE Design Automation and Test in Europe Conference (DATE), 2003, pp. 1006– 1011. [3] W. Zhao, C. Papachristou, Testing DSP cores based on self-test programs, Proceedings of the IEEE Design Automation and Test in Europe Conference (DATE), 1998, pp. 166– 172. [4] L. Chen, S. Dey, Software-based self-testing methodology for processor cores, IEEE Trans. CAD Integ. Circuits Sys. 20 (3) (2001) 369–380. [5] N. Kranitis, G. Xenoulis, A. Paschalis, D. Gizopoulos, Y. Zorian, Low-cost software-based self-testing of RISC processor cores, Proceedings of the IEEE Design Automation and Test in Europe Conference (DATE), 2003, pp. 714–719. [6] H. Al-Assad, B.T. Murray, J.P. Hayes, Online BIST for embedded systems, IEEE Des. Test Comput. 15 (4) (1998) 17–24. Oct.–Dec. [7] M. Nicolaidis, Y. Zorian, On-line testing for VLSI—a compendium of approaches, J. Electron. Testing: Theory Appl. 12 (1–2) (1998) 7– 20. [8] N. Oh, E.J. McCluskey, Error detection by selective procedure call duplication for low energy consumption, IEEE Trans. Reliability 51 (4) (2002) 392–402. [9] J. Hennessy, D. Patterson, Computer Architecture A Quantitative Approach, 1996, MKP. [10] Intel Corporation, Mobile Power Guidelines 2000, Dec. 11, 1998. [11] Plasma CPU Model, http://www.opencores.org/projects/mips. [12] J. Phil, E. Sand, Arithmetic Module Generator for High Performance VLSI Designs, http://www.fysel.ntnu.no/modgen/.