Journal of Systems Architecture 57 (2011) 749–760
Contents lists available at ScienceDirect
Journal of Systems Architecture journal homepage: www.elsevier.com/locate/sysarc
A new approach to evaluating internal Xilinx FPGA resources Ignacio Bravo ⇑, Alfredo Gardel, Beatriz Pérez, José Luis Lázaro, Jorge García, David Salido Electronics Department, University of Alcala, Escuela Politécnica Superior, Campus Universitario, Ctra., Madrid – Barcelona km. 33.6, 28871 Alcala de Henares, Madrid, Spain
a r t i c l e
i n f o
Article history: Received 26 May 2010 Received in revised form 30 May 2011 Accepted 30 May 2011
Keywords: FPGA Internal faults FPGA-tests Automatic multi-load bitstream
a b s t r a c t In this paper, a new approach of application test process is presented aimed at verifying internal Xilinx FPGA (field programmable gate array) resources using a multi-load bitstream system. Basically, the new system comprises an algorithmic part, running on a PC (the software aspect), and an ad hoc hardware architecture. The bitstreams necessary for testing FPGA internal resources are automatically generated on a PC using a sequential algorithm, which varies according to the FPGA chip to be evaluated, and are subsequently downloaded onto the hardware architecture. Next, a customized application, also run on a PC, downloads the previously generated bitstreams consecutively, using the Xilinx Impact tool. The hardware architecture comprises two boards based on FPGAs. The first, called the Mother Board (MB) is used to implement the design which is responsible for sending and receiving the tests to and from the second board, called the FUT (FPGA under test) Board, where the FPGA to be tested is located and where the evaluation test is conducted. Thus, in order to ensure correct transmission of the test/results patterns, a communication bus between both boards is required. The two FPGAs are configured using JTAG protocol, and reconfiguration of both is carried out via a multi-load algorithm which, once each resource unit has been tested, downloads a new bitstream onto the FUT. The present proposal enables the resources of an FPGA to be tested and provides an exhaustive, complete report on the status of the FPGAs different internal resources, with a view to reusing the FPGA for another application. Ó 2011 Elsevier B.V. All rights reserved.
1. Introduction Due to the broad range of applications presently offered by FPGAs (field programmable gate array), their use has become increasingly widespread in different areas and sectors. Depending on the field in which the FPGA is to be used, and on the application implemented, the cost of an FPGA can range from $10 to $5000, and it is in the latter case, where the FPGAs are most expensive, that this research is most relevant. It was due to the high price of some of these reconfigurable devices that the idea arose of bringing down application costs through reuse of the device. Such a possibility may become an important feature for consideration when designing hardware. The internal status of an FPGA could become a determining factor in the final stages of design verification. In principle, this is not an issue where the circuits have been manipulated correctly and acquired recently from an official distributor. However, the case may arise where circuits from another, previously manufactured product are reused, or where use is made of FPGAs which have been held in stock for a long time and may have been damaged ⇑ Corresponding author. E-mail address:
[email protected] (I. Bravo). URL: http://www.depeca.uah.es (I. Bravo). 1383-7621/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.sysarc.2011.05.003
by environmental factors. In these circumstances, greater doubt exists concerning whether the FPGA in question is functioning correctly. Therefore, when using such a device, it is necessary to be absolutely sure that function is not affected by the poor internal status of an FPGA. Therefore, testing internal resources enables the designer to confirm whether the status of an FPGA’s internal resources is satisfactory for use. This system for testing the internal resources of an FPGA arose in response to these uncertainties concerning actual function status. With this FPGA evaluation, a report is obtained detailing the status of the principle internal resources comprising the FPGA, thus indicating those areas of the FPGA where correct function cannot be guaranteed. This system could be of great service in the industrial sector in terms of reducing costs, given that there are presently many powerful, expensive FPGAs in existence which are considered defective due to faults which could turn out to be insignificant if only the true extent of the device’s internal fault were known. The report obtained from testing these FPGAs provides information on the extent of the fault, and on the possibility of reusing the device rather than discarding it, thus generating considerable savings for the design company. The proposal presented in this paper constitutes a reliable and versatile system, capable of analyzing the internal function
750
I. Bravo et al. / Journal of Systems Architecture 57 (2011) 749–760
of an FPGA and which can easily be adapted to different FPGA families. Following the introduction, this paper is structured as follows: Section 2 will describe the research conducted to date on testing FPGA internal resources. In Section 3, an analysis is given of the characteristics of the new design proposal. The results obtained are given in Section 4, and the conclusions in Section 5. 2. Related research Most of the research related to FPGA testing can be divided into two groups according to the two different methodologies used to test the internal resources of an FPGA. The first approach has focused on looking for faults in a specific design, which covers less than 100% of the FPGA. This methodology is known as application-dependent testing (dependent on the design being tested). With the second methodology, testing is focused on looking for faults in a more global design, and seeks to identify possible faults in occupation areas rather than specific faults in design function. In contrast to the first methodology, this approach does cover close to 100% of FPGA occupation, and is known as the application-independent test, as the test is not run on a specific design [1]. The methodologies are complementary since the applicationdependent test covers specific faults in the internal resources of an FPGA used for a particular design, whilst the application-independent test covers almost 100% of FPGA occupation using a broad design which occupies a high percentage of the FPGA’s occupation capacity. The use of a hybrid between the two applications described above provides a more accurate picture of an FPGA’s internal function, since the search for specific FPGA resource faults is applied to 100% of the available space. Before expanding on research related to the different approaches to the test, it would be pertinent here to give a brief description of the composition and function of the BIST (built-in self test) technique, which is the most commonly employed technique for testing FPGA internal resources using the applications described previously. The BIST technique consists of a system for generating patterns, which are then inserted into the CUT (circuit under test). Results from the CUT are analyzed, together with the patterns generated, by the ORA (output response analyzer) system. In [2–5], detailed descriptions are given of this technique, as applied to different configurations. The technique consists of three large, inter-connected blocks (see Fig. 1) on which the test process is focused: The first block is the TPG (test pattern generator), which generates the test patterns essential for the entire design evaluation process. The second block corresponds to the digital CUT (circuit under test) implemented on the device and onto which the test patterns are loaded in order to test either internal function or the interconnections made (WUT: wire under test) between the different digital circuits implemented on the FPGA. In the latter case, the time taken for data to travel along the interconnec-
TPG
CUT WUT
ORA
BIST Controller
Fig. 1. Block diagram of BIST test.
tions is then tested. The CUT is specific if the aim is to apply the application-dependent technique, or more general if the aim is to apply the application-independent test. The test patterns previously generated by the TPG block are applied to the CUT or WUT, obtaining results which are then sent to the third block. The third and last block of the BIST system is the ORA block (output response analyzer). The vectors obtained from the digital circuits are stored in this block and compared with the test patterns previously inserted. Thus, the final results of the test process are obtained, normally comprising an array of XOR gates which compare the results of the actual test with the expected results. When an application-dependent test is applied, the resources analyzed are limited to the field covered by a specific design and do not contribute any information concerning the status of the device as regards a different configuration or the remaining unconsumed internal resources. If the circuits under test are varied when testing the FPGA, the results will not provide a 100% valid picture of FPGA status, since the new circuit could occupy new areas within the FPGA or could use new resources which have not been tested. Therefore, although this technique is valid for specific designs, it presents evident deficiencies for designs which may be modified or extended over time in terms of the results obtained. Many studies have been carried out using the applicationdependent technique. For example, in [6], the BIST technique was applied to a digital circuit which covered fixed configurations on the FPGA. The use of two different digital circuit configurations on the FPGA was proposed, in order to provide nearly 100% fault coverage. The first configuration presented is based on mapping LUTs (look-up table) onto CLBs (configurable logic block) implementing AND logic. The second configuration is based on constructing LUTs for CLBs implementing OR logic. For both configurations, Flip-Flops are used in test pattern flow through the CLB. The patterns in both cases are inserted at the LUT gate and pass through the logic option implemented (AND/OR), feeding the results back at the LUT gate via Flip-Flops. A more exhaustive evaluation of I/O resources, once again based on use of the BIST technique to implement the test, is found in [7]. Here, a description is given of a test technique for two types of I/O resources: IOBs (input–output blocks) and I/O inter-connections. Since these resources have limited controllability, test consideration of the internal resources (logic and routing) must be optimal. It should be possible to test each resource once only, thus reducing the number of FPGA reconfiguration phases. An application-dependent test scheme is presented in [8], based on a design configuration using LUTs, logic gates and Flip-Flops (FF). Two alternative test configurations are presented, where the FFs are connected directly or indirectly to the IOBs, and the LUTs are configured as XOR. This design simplifies the process of obtaining test results. Other possible application-dependent test configurations are described in [9–11], all of which employ different test models to test resource function using techniques similar to BIST. In [9], the authors have focused on testing delays in transition time of resource units in the LUTs, by applying the BIST technique to WUTs. The test design applied in an application-independent test provides data about FPGA status for any configuration covering the area tested using this technique. In [1] and [12], an interesting description is given of the use of this technique. Broad test designs were chosen which were very similar to those selected for the application-dependent test, but which covered a greater area of the FPGA. These tests were applied using the BIST technique. Since this is very similar to the dependent technique, it is not necessary to elaborate further.
751
I. Bravo et al. / Journal of Systems Architecture 57 (2011) 749–760
The design presented in [2] refers to a test for programmable logic devices which has been extended to include FPGAs, and with which the authors achieved 100% fault coverage of the device’s programmable area. The test conducted covered faults produced in inter-connections between internal resources, and those produced in internal resource function. This is the most similar design to that presented in this present study, in terms of the global intention of the design, but various fundamental differences exist in test implementation: although both use the BIST technique, it is applied differently. In [13] the blocks necessary for the BIST technique are implemented on the device to be tested, and two test phases are run, during which there is an exchange of block functions In the first phase, the TPG, CUT and ORA blocks are located at determined positions on the device, the CUT is tested and the results captured in ORA are collected. In the second phase, the TPG and ORA blocks are exchanged for CUT and tested, and the results captured in ORA collected. Once both phases of the test are complete, test results are obtained for all the internal resources of the device, since the blocks which were initially used as TPG and ORA are analyzed in the second phase as CUTs. The present article presents a different picture of BIST technique application, since the TPG and ORA blocks are not located on the FPGA to be tested but rather on an adjoining FPGA responsible for running the FPGA test. This application thus avoids the need to carry out the test in two phases, saving time in FPGA reconfiguration, and presents the additional advantage of ensuring reliability of the initial results obtained since no resource is used as an active part of the test. The only limitation of the design presented corresponds to physical implementation, since if the communication bus between the two FPGAs required by the design is limited, it becomes necessary to carry out more than one reconfiguration of the FPGA tested in order to obtain 100% resource fault coverage. In the design presented in the present article, a hybrid technique is employed whereby the specific test circuits used in the application-dependent test are combined with the 100% FPGA coverage employed in the application-independent test. Of all the FPGA internal resources, the present proposal evaluates the IOBs, CLBs, BRAMs (Block RAM) and the embedded Multipliers. These are the resources which occupy most of an FPGA and thus by testing these resources, 90% of FPGA function is covered. Furthermore, the choice of these resources rather than others such as the DCMs (digital clock management), is based on the importance of these when mapping designs onto an FPGA, since they are mainly located in the internal structure of FPGAs and regardless of the design implemented, are used more than any other resource. From a point of view of tested internal FPGA resources, the majority of the related founded works could be split into two categories: evaluating of routing resources or logic cell resources. The first ones are in charge to evaluate the delays of connections and possible faults [14]. The second ones usually evaluate only logic elements [13]. The current proposal is included in this last one although other internal resources such as BRAM, IOBs or Multipliers are also analyzed.
requires multiple digital circuit designs (equivalent to multiple internal resource reprogramming bitstream files) which are loaded automatically using an application which will be described later in the paper. The creation of each of the digital designs is also automatic, using an iterative algorithm run on a PC which captures the data necessary for this from a data base generated for this purpose. This database is specific to each FUT being evaluated, given that internal resources vary in terms of typology, location and number depending on encapsulation and family. Thus, the test process is controlled from a PC, which is responsible for administering the multi-load bitstream system using a graphic interface which provides a visual display of test run status. The hardware architecture proposed (see Fig. 2) comprises two printed circuit boards joined by a communication bus, one of which is connected to a PC via USB and JTAG ports. The first board contains an FPGA called FMother (FPGA on Mother Board), which sends and receives the test patterns to and from the second board comprising the hardware architecture, which contains the FUT to be evaluated. There is a parallel communication bus between both boards with a set number of lines for transmitting and receiving the test patterns and their corresponding results, among other information. The greatest restriction when designing the different digital test circuits is the size of this bus, given that the number of bitstreams produced may be increased or reduced depending on bus width. This restriction is explained by the fact that each bitstream file is associated with a set number of digital test designs which reconfigure and use a fixed number of the FUT’s internal resources. If a relatively large bus is available, the number of bitstream loads can be reduced to one, that is, with a single bitstream it will be possible to integrate all the digital designs necessary for testing the CLB, BRAM, Multiplier or IOB resources by unit and in parallel. However, in order to achieve this with a FUT which has a large number of internal resources, the communication bus would need to be very large indeed, and therefore the FMother would need to have many pins, normally only available in middle to top of the range FPGAs. The FMother’s function is extremely simple, and therefore an FPGA from the lower end of the range would be more than sufficient. Thus, as often happens, a compromise must be sought, in this case between a large communication bus (which would imply fewer bitstreams) and a small FPGA which would save on costs and would cover the operational needs of the design implemented on the Mother Board. Therefore, a large communication bus causes a large FMother. To validate this fact, Fig. 3 displays the ratio of necessary bitstreams for some Xilinx Virtex 4LX devices vs. communication bus width. These devices have been used as an example because internal Virtex 4 resources are very similar to Virtex 2 Pro family which has been chosen for authors’ proposal.
Communication bus
3. Characteristics of the new design proposal T FU
The design proposed generates a multi-load bitstream system on the FUT capable of verifying almost 100% of the FUT resource units. An ad hoc hardware architecture is used to implement the design, details of which are given in Section 3.1, together with the specific software application, which is run on a PC (Section 3.2). An FPGA is composed of different configurable internal resources which can be used for different functions. Thus, the system designed must test the functionality of each resource and therefore
USB-JTAG
AR BO
D
R HE OT RD M OA B
Fig. 2. Picture of the development boards designed for this study.
752
I. Bravo et al. / Journal of Systems Architecture 57 (2011) 749–760
1800 1600
Bitstreams
1400 XC4VLX15
1200
XC4VLX25
1000
XC4VLX40
800
XC4VLX60
600
XC4VLX80
400
XC4VLX100
200
XC4VLX160
0 50
75 100 0 125 150 175 200 250 300 Communications Bus Width
Fig. 3. Number of bitstreams vs. communications bus width for Virtex 4LX family.
As we can see in Fig. 3, the number of bitstreams decreases when communication bus width increases, because a large bus makes possible more simultaneous tests for internal FPGA resources. From the point of view of testing time, it is an advantage to use a large bus because the number of bitstreams decreases. However, this situation forces the required FMother to be larger. Development of the software associated with this study focused on algorithms for automatic bitstream generation, automatic generation of test patterns for each of the FUT internal resources, and on testing each of these resources. Test patterns are sent and received to and from the FUT by the FMother. The digital circuits destined to occupy the FUT internal resources tested are located on the second board in the hardware architecture, the FUT Board. Each of these parts will now be described in more detail.
Board, this has a socket for connecting different FUTs, but these must be encapsulated in the same way. For testing different FUTs which have been encapsulated differently, a FUT board must be created/available for each type of encapsulation. The choice of a Mother Board based on FPGAs rather than any other system, such as one based on microprocessors, was influenced by the fact that the communication bus required as many entry/exit lines as possible, making the FPGA an ideal device as it incorporates multiple entry/exit lines. This situation is very notorious when the communication bus has a large width because it is not feasible analogous solutions based on microprocessor. When both technology solutions are possible, from point of view of cost, FPGA solutions have a lower price. 3.2. Software architecture
3.1. Hardware architecture The hardware architecture required must be able to support a system capable of loading the multiple bitstreams which carry out the FUT internal resource evaluation. In order to send, test and receive test data, the best option is to use an FPGA on which a digital design capable of storing, comparing and sending data can be implemented. In order to create this architecture, two boards are necessary, containing the FPGAs required by the design. The first board (Mother Board) contains the FPGA responsible for storing, comparing and sending data. These data come from and go to the FUT contained on the second board, the FUT Board (see Fig. 2). The internal structure necessary for each of the two boards shown in Fig. 2 is quite similar. Both need a power supply (constant current regulators and protection), an FPGA device (on the Mother Board this functions as the test controller, whilst that on the FUT Board is the device being evaluated), a communication bus providing a connection between both boards (as will be explained later, this should ideally contain as many lines as possible), and, in the case of the Mother Board, a USB communication transceiver to transfer data between the PC and the Mother Board and configure both FPGAs. As regards this latter aspect, it should be highlighted that the lines necessary for programming both boards via JTAG are located within the communication bus. Thus, a daisy chain is established between both, where the USB transceiver receives the two bitstreams required by both FPGAs. The main difference between the two boards is that a Mother Board can be used to evaluate and analyze multiple FUTs as long as the FUT Board has a communication bus which can be connected and implemented on the Mother Board. In the case of the FUT
Software development was based on the implementation of algorithms for producing the multi-load bitstream system used in the design for subsequent testing. The different parts that constitute this block can be seen in Fig. 4. Of especial note is the algorithm destined to produce bitstreams automatically by selecting an FPGA as FUT ((1) in Fig. 4). For example, if an FF672 FPGA is selected as the FUT, the mean quantity of CLBs connected to FPGAs with this encapsulation is approximately 1500 CLBs, which will be tested together with the other internal resource such as the BRAMs, Multipliers and IOBs. The result of this is that wherever the number of bus lines connecting the design’s boards is limited, a considerable number of bitstreams becomes necessary in order to complete the FUT test The first key component of the bitstream creation algorithm comprises a database containing the size and specific location of all the resource units which constitute the FUT ((4) in Fig. 4). Given this database, it is merely necessary to create an iterative algorithm which captures the data and locates them accurately in the files necessary to create a series of bitstreams which will then comprise the FPGA Bitstream Database ((5) in Fig. 4). The importance of automatic bitstream file creation resides in the subsequent system’s ability to adapt to different forms of encapsulation and different families; to locate this design on another FPGA merely requires changing the database for that corresponding to the new FPGA. To summarize, the elements contained in the database include: Communication bus: s Number of pins s Location of pins (pin name and number) I/O pins:
753
I. Bravo et al. / Journal of Systems Architecture 57 (2011) 749–760
Fig. 4. Process and tool flow chart used to test FUTs.
In addition to comparison, the digital design function of this FPGA is to store and send test results, using a register bank for this purpose, as can be seen in Fig. 5. Thus the design of the FMother associated with evaluating each of the FUT’s internal resources consists of a bank of registers, storage registers and a comparison block based on XOR gates which carry out the comparison between the test pattern sent and that received. Fig. 5 presents a diagram of the basic blocks contained in each of the designs incorporated into the FMother. It should be noted that in order to optimize test run time, the number of this FPGA’s designs should be kept to a minimum so that the minimum number of digital designs are employed when testing the CLBs, IOBs, BRAMs and Multipliers. In the case of CBLs, BRAMs and Multipliers, this can be simplified to a single design for each type of resource, that is, one design for CBLs, one for BRAMs and another for Multipliers. This is possible as all the resources in each of the three blocks are identical. However, differences exist in the case of the IOBs, since in addition to IOB dependence on the corresponding FPGA pin bank, it is not possible to test IOB bidirectional flow with a single test; a minimum of two tests must be implemented. The bitstreams generated for the FUT are specific to each internal resource tested. As mentioned earlier, the resources tested are the IOBs, CLBs, BRAMs and Multipliers. The purpose of the digital
s Number of pin banks s Number of pins per bank s Location of pins in each bank CLBs: s Number of CLBs s Existence of area without CLBs in the device (Location) BRAMs: s Number of BRAMs s Location Multipliers: s Number of multipliers s Location Once the bitstreams associated with the different test patterns required for testing each FPGA internal resource have been generated, the next important step is the comparison of results generated following the test with predicted test results ((2) in Fig. 4). This comparison is implemented on the FMother, with the aim of accelerating test run time and reducing quantity of data transmitted through the USB channel by only sending a binary stream to the PC indicating whether the test result was correct or not. It is therefore necessary to generate the bitstream files associated with the digital design which controls the test of a specific FUT resource.
Head Controller
Result Test Vector
Register 0 Register 1
….
Comparison Stage (XOR)
Register
USB Port
Register n
Bank of Registers
Pattern Test Vector
FPGA Mother Board
Fig. 5. Block diagram of digital design implemented on the FPGA Mother Board.
C o m m u n i c a t i o n B u s
754
I. Bravo et al. / Journal of Systems Architecture 57 (2011) 749–760
ent digital circuits implemented for evaluating each type of Xilinx FPGA internal resource are detailed below:
circuit implemented in each bitstream is to test the function of each specific resource in parallel, using the test patterns. Each bitstream contains an equal number of digital test circuits for each of the resources, although this number will vary according to the width of the communication bus between the FMother and the FUT Board since bus width determines the number of resource units which can be tested simultaneously. Thus, for each resource unit a number of entry/exit lines are required in each digital circuit located in the unit, and these are used to insert the pattern data and capture the results which are then stored and compared on the FMother:
I/O pins connected to the communication bus. The test is based on sending and subsequently receiving a digital signal containing high and low levels. If the value received by the FMother from the FUT is identical to that sent, the IOB is functioning correctly. In order to attain this functionality, all that is required is to implement an entry/exit buffer at each of the bus pins (see Fig. 6a). In order to validate pin function in both directions, first a design with a buffer in one direction is downloaded, and then another design with a buffer in the other direction, thus providing information about correct entry/exit function. All other I/O pins: The test for all other I/O pins which do not belong to the communication bus has a similar function. In order to minimize the number of designs applied, all other FUT I/O pins are linked together, as can be seen in Fig. 6b using the FUT Board’s PCB pin to pin connector. This feature enables the test pattern to be sent from the FMother, via the communication bus, to a FUT pin exit buffer. As a link exists between pins, this pattern exits via one pin and returns via another which is linked to a communication bus pin. In this way, the test evaluates the performance of an exit pin and an entry pin simultaneously. Fig. 6b illustrates the digital design implemented on a FUT for evaluating this resource. CLBs: With the aim of evaluating the majority of CLB internal resources, the use of logic blocks which use the maximum possible number of each CLB’s combinational and sequential components is proposed. This proposal implies, for example, downloading a MUXF8 onto each CLB, which would use the CLB’s LUT, internal multiplexers and fast carry resources [15], together with the interconnection resources joined to two Flip-Flops (FF) (see Fig. 6). The test vector is composed of a high level bit and a low level bit which are transmitted by each multiplexor channel. This is then registered by the FF and the test result is sent via the two exits contained in this design (results of Slices Y and X). If this is captured by the FMother exactly as it was sent, this indicates that each Slice is functioning correctly (Fig. 7). BRAMs: This part of the test is carried out after testing the CLBs; consequently, previously tested CLBs are used. For this, a 16 16 primitive is installed on a RAM memory. To verify whether this block is functioning correctly, a 10 bit counter is implemented on CLBs, which covers all addresses. The data to
I/O pins connected to the communication bus: one entry line and one exit line required. All other I/O pins: one entry line and one exit line required. CLBs: One entry line and two exit lines are required. The two slices comprising a CLB are evaluated simultaneously, inserting the same data line into the entry and capturing a separate exit line for each slice. BRAMs: Depending on the primitive implemented in the space reserved for the memory modules, the entry and exit lines may vary. The design presented here used a 16 16 BRAM, and thus required 16 entry lines and 16 exit lines, in addition to the control lines necessary for memory reading and writing functions. Multipliers: Depending on the primitive implemented in the space reserved for the Multiplier modules, the number of entry and exit lines may vary. With the aim of reducing the number of occupied lines in this design, 36 lines were used to evaluate an 18 18 multiplier, all of which were connected to the multiplier’s results. The location of each of the test circuits for evaluating the internal FUT resource was established by inserting the spatial coordinates for the resource into the ucf (user constraints file). Using the multi-load system, the application run on the PC downloads the corresponding bitstream onto the FUT at the exact moment of testing. The final number of bitstreams generated for the FUT depends not only on communication bus width but also on the existence of areas without resource units, for example those destined for a hw microprocessor such as PowerPC for certain Xilinx FPGA families. In order to evaluate each resource unit, a digital circuit was developed for testing internal characteristics effectively. The differ-
C o m m u n i c a t i o n s
Input Pin
IOB Buffer
Output Pin FUT
B u s
C o m m u n i c a t i o n s
Resource to evaluate
Hardware Join Output Pin
Input Pin IOB Buffers
Input Pin IOB
Output Pin
Buffers
FUT
B u s
(a)
(b)
Fig. 6. (a) Digital test to evaluate IOBs associated to communications bus. (b) Digital test to evaluate all other IOBs.
755
I. Bravo et al. / Journal of Systems Architecture 57 (2011) 749–760
Output Pin C o (Slice Y result) m Input Pin m u (Test Vector) n i c a t i o n RST PIN s B u s
C o m m u n i c a t i o n s
FF Y MUXF 8 FF X
CLB to evaluate
CLK PIN
DOUT
B u s
36
K
A MULT18x18 18x18 MULT
K
B
FUT
Output Pin (Slice X result)
Fig. 9. Digital test to evaluate one embedded multiplier module.
FUT Fig. 7. Digital test to evaluate one CLB.
be written on the memory is sent from the FMother (see Fig. 8). These are then read, and the results analyzed on the FMother. Multipliers: A primitive is installed on an 18 18 multiplier and two fixed data items are inserted as entry constants (K in Fig. 9). This multiplication option was chosen with the aim of maximally optimizing the communication bus, since the result occupies 36 bits. Therefore, this represents the only data which go through the communication bus. The FMother monitors the multiplication pattern operating in the FUT, checks whether the result received is correct and sends the consequent evaluation report to the PC. Fig. 9 shows a diagram of the digital design blocks implemented for the internal multiplier test. The multi-load bitstream design application is controlled by a graphical interface ((3) in Fig. 4) which is run on a PC (Fig. 10). The PC sends and receives the test patterns and the results of these data via a USB channel. The interface provides the end user with the possibility of choosing which test module to test, and after running the test the user can check the function status of the module on the error report generated. Different LED type indicators provide information about test progress status and module status following the test. The sequential operation of the graphical interface, combining the different processes illustrated in Fig. 4 is illustrated in the flow chart depicted in Fig. 11. As can be seen from this Figure, the first
C o m m u n i c a t i o n s B u s
DOUT
16
16
EN_Counter
DIN 10Bits Bits 10 COUNTER COUNTER
ADDR
WE
BRAM BRAM (RAM16X16) 16X16) (RAM
EN RST CLK FUT
Fig. 8. Digital test to evaluate one BRAM module.
step is for the communication bus to detect the FUT (step 1, Fig. 11). This detection process enables automatic identification of the FUT, due to the fact that each FUT Board sends two hardware identifiers which are read at the Mother Board and subsequently sent to the PC. An identifier serves to identify the kind of encapsulation used, and consequently, the type of FPGA. If the bitstreams for this FUT have already been generated, these files can now be downloaded (step 2, Fig. 11). If this is not the case, the bitstream generation process is activated in order to create the corresponding files using the values supplied by the database (step 3, Fig. 11). If downloading takes place correctly, the USB channel is enabled to establish communication between the PC and the Mother Board, in order to receive the test results (step 4, Fig. 11). Once all test results have been received, the different test process reports are generated (step 5, Fig. 11), and the FPGA test is complete. It should be noted that if on sending the test patterns the communication bus test fails, the test is terminated and the FPGA evaluated is classified as not apt, since the pins and/or IOBs associated with the communication bus are defective and it is not therefore possible to conduct the test. 4. Results In order to validate the proposal presented here, two evaluation boards were taken which would serve as the hardware architecture, in accordance with the requirements presented in Section 3.1. The board chosen for the Mother Board was a Diligent Nexys2 evaluation board [16], whilst that for the FUT Board was a Xilinx HW-AFX-FF672-300 evaluation board [17]. These boards were chosen over other commercial solutions due to the features that both offer. For example, the HW-AFX-FF672-300 board has a socket for inserting different FPGAs, as long as these are encapsulated in the same way (in this case, the FF672). This feature provides the adaptability and versatility sought for the FUT board. It also has JTAG communication, necessary for FPGA configuration. Board characteristics are described in detail in [16]. However, in order to link both boards, it was necessary to design an auxiliary board which would serve as the communication bus between boards. Fig. 12 shows the complete FPGA testing system, based on the two commercial evaluation boards described above, together with the board connecting them (the communication bus). The use of these boards for validating the design gave rise to a width restriction related to the communication bus, since the Nexys2 board has a 40 I/O pin expansion bus. Consequently, generation of the different bitstreams for both the FMother and the FUT Board were affected by this restriction. A limitation was also encountered
756
I. Bravo et al. / Journal of Systems Architecture 57 (2011) 749–760
Fig. 10. Main window for controlling the FPGA test.
in the FUT’s PowerPC area, which was solved by restricting design conditions in the CLB test modules (Table 1). The number of bitstreams that it is necessary to generate and download is determined by the number of pins which comprise the communication bus. The following expressions describe the total number of bitstream files to be downloaded onto a FUT (T_FUTBITSTREAM (1)).
T FUTBITSTREAM ¼ BITSTRBUS þ BITSTRIOBs þ BITSTRCLBs þ BITSTRBRAMs þ BITSTRMULTs
ð1Þ
where BITSTRX is the number of bitstream files necessary in order to evaluate each internal resource X. Expression (2) describes how to obtain the number of bitstream files for each internal resource to be evaluated.
WCB JBUS ðnIOBs WCB Þ BITSTRIOBs ¼ JIOBs nCLBs BITSTRCLBs ¼ JCLBs nBRAMs BITSTRBRAMs ¼ JBRAMs nMultipliers BITSTRMULTs ¼ JMULTs BITSTRBUS ¼
ð2Þ
where WCB is the total number of lines which comprise the communication bus in order to carry out the entire test process, and where JX is an index which refers to the number of communication bus lines used to conduct the test of each resource X.
The width of the communication bus (WCB) to test the internal resources of a FUT XC2VP7 is 40 lines. This value is determined by the width of the Nexys2 board expansion bus. The total number of FPGA XC2VP7 internal resources to evaluate is given in Table 2. CLBs in Virtex 2 Pro family are composed by 4 Slices and 1 Slice is made up 2 LUTs, 2 Flip-Flops, Multiplexors and Fast Carry connections. Due to the features of internal Virtex 2 Pro resources,4 test lines are necessary to analyze a CLB (Wtest lines for CLBs ¼ 4), 2 for an IOB (one entry line and one exit line) (Wtest lines for IOBs ¼ 2), 37 for a BRAM (Wtest lines for BRAMs ¼ 37) and 36 for a multiplier (multiplier exit bus width) (Wtest lines for Multipliers ¼ 36). In all cases, the aim was to minimize the number of downloads onto the FPGAs, and thus to achieve maximum reductions in test run time. This is especially notable in the case of the CLBs, where every effort was made to conduct the maximum number of simultaneous tests on resources of this type, and for this reason the CLB Reset evaluation was suppressed in order to save on a communication bus line. For the FMother, it was sufficient to download a single digital design in order to evaluate each FUT internal resource, with the exception of the IOBs where, due to the difference in pins in the different FUT banks it was impossible to employ only one bitstream for the FMother. With regard to the total test run time (TTEST_TOTAL), this is a function of the number of bitstream files downloaded and run on the FUT and FMother, of the time taken to communicate data between the PC and the FMother via the USB bus, of the time taken to write reports on the PC and the time taken to boot Impact. All these times are given in (4).
TTEST JX ¼
WCB Wtest
ð3Þ
lines for x resource
If the expressions, (1–3) are applied to the platform in Fig. 12, in the total number of bitstreams necessary for the evaluation platform used is presented. For this, the following should be taken into account:
TOTAL
¼ TFUT þ TFMOTHER þ TTX þ TSETUP
IMPACT
TEST
þ TWR
REPORT
ð4Þ
where TFUT and TFMOTHER are the times taken to configure and run a test associated with a bitstream on the FUT and FMother, respectively (5).
757
I. Bravo et al. / Journal of Systems Architecture 57 (2011) 749–760 Table 1 Number of bitstreams necessary to evaluate an XC2VP7.
START
(1)
No
FUT Connected?
Yes Selected FPGA in Data Base?
No
Bitstreams Generation
Yes
Programmed Correctly?
No
Yes
(4)
Opening USB port
Opened Correctly?
No
Transferring results of Tests
(5)
Number of files for FUT
Number of files for FMother
BITSTRBUS BITSTRIOBs BITSTRCLBs BITSTRBRAMs BITSTRMULTIPLIERs T_FUTBITSTREAM/T_FMOTHERBITSTREAM
2 17 493 44 44 600
1 3 1 1 1 7
Table 2 Internal resources of XC2VP7-FF672 FPGA.
FMother and FPGA Bitstreams Loading
(2)
Bitstreams
Generating Test Reports
END Analysis Internal FPGA resources Fig. 11. Flow chart of algorithm running in PC for testing FPGA internal resources.
CLBs
IOBs
BRAMs
Embedded multipliers
1232
371
44
44
The TTX_TEST in (4) is the time taken to transmit the test result from the FMother to the PC. TWR_REPORT refers to the time taken to generate the test results report. Lastly, TSETUP_IMPACT is the time taken by the PC from receiving the order to download a bitstream from the software application to beginning to transmit data via the USB bus. As can be seen in Table 3, this time is the most restrictive and is limited by the Xilinx Impact application. This is the time that definitively conditions test run time. It is important to note any difference in T WR REPORT (4) regardless of whether there is an error in the tested internal resource or not. Should an error be detected in the test, this time increases, since the application run on the PC must open the error report in order to insert the identified error (by default, the test result report is correct unless an error is detected, since this later eventuality is less probable). As mentioned at the beginning of this section, in order to validate the proposal presented here, two commercial boards were used. Thus, the FUT evaluated was an XC2VP7-FF672, and Table 2 gives a summary of the internal resources evaluated following different internal resource evaluation tests. With the aim of investigating whether the system was truly capable of identifying faults in resource units, synthetic errors were installed into different resources, and the corresponding times measured, as shown in Table 4. Results of the different tests are given in Fig. 13. In the light of the tests performed and the results obtained, shown in Fig. 13, the following should be noted: (a) For all tests performed, it was the analysis of the CLBs which consumed the greatest percentage of time. This is due to the fact that this is the largest resource within a Xilinx FPGA. As Table 3 Summary of test times. Time
Description
Approximate value
TSET_UP_IMPACT
Time taken by the PC from receiving the order to launch Impact in console mode to beginning to transmit the bitstream by USB Time taken to configure the FUT from receiving the first FPGA bit Time taken to configure the FMother from receiving the first FPGA bit TFMOTHER_TEST
4s
Time taken to send the test result from the FMother to the PC Time taken by the PC to open/write the test result report
10 us
TFUT_CONF TFMOTHER_CONF TFUT_TEST, Fig. 12. View of complete system for testing FPGAs.
TFUT ¼ ðTFUT CONF þ TFUT TEST Þ TOTAL FUTBITSTREAM TFMOTHER ¼ ðTFMOTHER CONF þ TFMHOTER TEST Þ TOTAL FMOTHERBITSTREAM ð5Þ
50 ns TTX_TEST TWR_Report
25 ms 25 ms Test run time
50 ms
758
I. Bravo et al. / Journal of Systems Architecture 57 (2011) 749–760
Table 4 Details of each test conducted on XC2VP7-FF672 FPGA FUT. Module name/tested modules
Test 1 (errors)
Test 2 (errors)
Test 3 (errors)
Test 4 (errors)
Test 5 (errors)
Test 6 (errors)
Bus_IOBs IOBs CLBs BRAMs Multipliers
0 0 0 0 0
2 11 6 1 1
0 11 80 3 3
5 0 80 3 3
10 0 80 3 3
10 5 20 12 12
40 371 5440 44 44
37 operative lines are required for BRAM testing, and 36 in the case of multipliers, wherever the bus width contains insufficient lines, exit data are multiplexed in time via the incorporation of multiplexer hardware in the FUT which sequences data exit. (d) Lastly, it should be noted that an increase in CLB errors does not influence test time, as can be seen in the results for tests 5 and 6. This is explained by the fact that the number of bitstreams to be downloaded does not vary and the error which really conditions total test run time is the size of bus IOBs.
Fig. 13. Total test run time to evaluate XC2VP7-FF672 with different internal resource errors.
previously argued, this time can be notably reduced by using greater bus width. However, the platforms employed in this research did not permit the bus used to be expanded any further. (b) In tests 5 and 6, total test time increased markedly (30% more) in comparison with the best performance, due to notable bus errors (theoretically, 25% of the bus was damaged). The consequent reduction in operative bus width meant that during CLB testing, the number of CLB tests that could be conducted simultaneously was reduced. In other words, the number of bitstreams was increased. (c) Where a bus error existed (Tests 2, 4, 5 and 6), treatment of the BRAM and Multiplier phases differed from tests where no error was present in the bus IOBs (Tests 1 and 3). Since
Fig. 14. Results of CLBs (left side) and time (right side) tested for authors proposal vs. [18] proposal.
In order to evaluate the performance of the authors’ proposal, Fig. 14 shows the testing time (right side) for CLBs of authors’ proposal vs. [18]. This work evaluates XC4000 FPGAs which are obsolete and whose internal CLB are simpler than the ones authors use in the present proposal. In any case, the time consumed by this proposal is approximately 25% higher compared to [18]. However, it is important to note that the number of CLBs to be evaluated is three times higher and also the internal complexity of these is greater. This is therefore the current proposal significantly improves the time to evaluate the internal resources of a Xilinx FPGA with respect to previous work. 5. Conclusions This paper presents a new proposal for evaluating FPGA internal resources. This evaluation concept is of great interest as regards verifying that the internal resources of an FPGA employed in a design are functioning correctly, or when an FPGA is to be reused for a different design. The present proposal is based on a hardware architecture composed of two boards, which in turn are based on FPGAs, together with a software application which carries out FPGA internal resource testing automatically. The concept of an automatic process is quite important since it implies that the system is at the same time rapid and autonomous, capable of testing an FPGA without user intervention, and freeing the user from the necessity of introducing the different configurations for testing each resource. The time taken by the multi-load system is determined by the process of loading bitstreams onto the FPGA, and thus this aspect takes considerably longer than processing test result information. Nevertheless, the possibility of testing an FPGA automatically, and independently of the user, is an indication of the effectiveness and versatility of the multi-load bitstream algorithm since, as commented previously, it can be used with different FUTs simply by adding the pertinent data to the database from which the bitstreams are automatically generated. It is therefore essential to have a process for generating binary file loads on FPGAs (bitstreams) as this enables them to be reused for future applications where FPGAs from the same family must be tested. The test run time for a FUT can be markedly reduced if a larger communication bus is used. This would enable each bitstream to test multiple internal resources of the same kind, to the extent even of using a single file for each resource to be tested (one for CBLs, another for IOBs, another for BRAMs and another for Multipliers). Analyzing the results presented in this paper, we can consider the time required to test an FPGA is remarkably high. However, it is
I. Bravo et al. / Journal of Systems Architecture 57 (2011) 749–760
important to note that when the FPGA to be tested has a high cost this time can help to save money if analysis to determine whether an FPGA can be reused or not. Most of the consumed time is due to the high test time consumed by the FPGAs configuration tool used (Xilinx Impact). If this is substituted by another with a lower TSET_UP_IMPACT, the total consumed time is reduced significantly. Test coverage in the present research was not quite 100%, since some FPGA resource units were not tested, such as the DCMs or the interconnection resources. Nevertheless, testing of the other internal resources selected gives a global overview of FPGA function, as correct functioning of the area occupied by CLBs could be considered more critical than that of an area occupied by a DCM. Acknowledgment Work supported by ESPIRA Project (DPI2009-10143) of the Spanish Ministry for Science and Innovation. References [1] M.B. Tahoori, S. Mitra, Application-independent testing of FPGA interconnects, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 24 (11) (2005) 1774–1783. [2] C. Stroud, E. Lee, M. Abramovici, BIST-based diagnostics for FPGA logic blocks, in: Proc. IEEE Int. Test Conf., 1997, pp. 539–547. [3] C. Stroud, S. Konala, P. Chen, M. Abramovici, Built-in self-test for programmable logic blocks in FPGAs (finally, a free lunch: BIST without overhead!), in: Proc. IEEE VLSI Test Symp., 1996, pp. 387–392. [4] C. Stroud, S. Wijesuriya, C. Hamilton, M. Abramovici, Built-in self-test of FPGA interconnect, in: Proc. Int. Test Conf., 1998, pp. 404–411. [5] M. Abramovici, C. Stroud, BIST-based detection and diagnosis of multiple faults in FPGAs, in: Proceedings of ITC International Test Conference, 2000, pp. 785– 794. [6] J. Quin, A brief introduction to application dependent FPGA testing, Dept. Of Electrical and computer Engineering, Auburn University,
. [7] L. Zhao, D.M.H. Walker, F. Lombardi, Iddq testing of I/O resources of S-Ram based FPGAs, in: Proceedings of the 8th Asian Test symposium, 1999, p. 375. [8] T. Lin, J. Zhao, J. Ren, J. Feng, Y. Wang, A novel scheme for applicationdependent testing of FPGAs, Dept. Of Microelectronics, Peking University, . [9] A. Krasniewski, Evaluation of delay fault testability of LUTs for the enhancement of application-dependent testing of FPGAs, Journal of Systems Architecture 49 (4–6) (2003) 283–296. [10] T. Lin, J. Feng, D. Yu, Application-dependent interconnect testing of xilinx fpgas based on line branches partitioning, in: Proceedings of 9th International Conference on Solid-State and Integrated-Circuit Technology, ICSICT, 2008. [11] M.B. Tahoori, E.J. McCluskey, M. Renovell, P. Faure, A multi-configuration strategy for an application dependent testing of FPGAs, in: Proceedings 22nd IEEE VLSI Test Symposium, 2004, pp. 1093–1167. [12] A New Scheme of FPGA Global Interconnect Fault Test, . [13] M. Abramovici, C. Stroud, BIST-based test and diagnosis of FPGA logic blocks, IEEE Transactions on Very Large Scale Integration (VLSI) Systems 9 (1) (2001) 159–172. [14] M.Y. Niamat, A. Sahni, M.M. Jamali, A built in self test scheme for automatic interconnect fault diagnosis in multiple and single FPGA systems, in: 50th Midwest Symposium on Circuits and Systems, MWSCAS 2007, 2007, pp. 229– 232. [15] Xilinx Inc., Using Dedicated Multiplexers in Spartan-3 Generation FPGAs XAPP466, . [16] Digilent Inc., Digilent Nexys2 Board Reference Manual, . [17] Xilinx Inc., Virtex-II Pro FF672 Prototype Board, . [18] Y.B. Liao, P. Li, A.W. Ruan, W. Li, W.C. Li, Full coverage manufacturing testing for SRAM-based FPGA, in: Proceedings of the 2009 12th International Symposium on Integrated Circuits, ISIC ‘09, 2009, pp. 478–481.
759
Ignacio Bravo-Muñoz received the B.Sc. degree in Telecommunications engineering in 1997, the M.Sc. degree in Electronics engineering in 2000, and the Ph.D. in Electronics in 2007, all from the University Alcala, Madrid, Spain. Since 2002 he has been is a lecturer in the Electronics Department of the University of Alcala. He is currently Associate Professor in University of Alcala. His areas of research are reconfigurable hardware, vision architectures based in FPGAs and Electronic Design.
Alfredo Gardel-Vicente received degrees in Telecommunication Engineering from the Polytechnic University of Madrid (Spain) in 1999, and a Ph.D. in Telecommunication from the University of Alcalá (Spain) in 2004. Since 1997 he has been a lecturer in the Electronics Department of the University of Alcalá. His main areas of research comprise infrared and artificial vision, monocular metrology, robotics sensorial systems, and design of advanced digital systems.
Beatríz Pérez-Galán received the B.Sc. degree in Telecommunications engineering in 2009, the M.Sc. degree in Automation engineering in 2010, all from the University Alcala, Madrid, Spain. She is currently a granted student working in FPGA-tests and cameras based on FPGAs and she is working in Electronics Department of the University of Alcalá since 2009.
José Luis Lázaro-Galilea received degrees in Electronic Engineering and Telecommunication Engineering from the Polytechnic University of Madrid (Spain) in 1985 and 1992, respectively, and a Ph.D. in Telecommunication from the University of Alcalá (Spain) in 1998. Since 1986 he has been a lecturer in the Electronics Department of the University of Alcalá. He is currently Professor in the Electronics Department of the University of Alcalá. His areas of research are: robotics sensorial systems by laser, optical fibers, infrared and artificial vision, motion planning, monocular metrology, and Electronics systems with advanced microprocessors.
Jorge García-Castaño received the B.Sc. degree in Telecommunications engineering in 2009, the M.Sc. degree in Electronics System engineering in 2011, all from the University Alcala, Madrid, Spain. He is currently working toward the Ph.D. degree in Electronics Department of the University of Alcalá. He is working in Electronics Department of the University of Alcalá since 2009. His currently research interests include computer vision and system based on FPGAs.
760
I. Bravo et al. / Journal of Systems Architecture 57 (2011) 749–760 David Salido-Monzú received the B.Sc. degree in Telecommunications Engineering and the M.Sc. degree in Electronics Engineering from the University of Alcalá in Madrid (Spain) in 2009 and 2011, respectively. He has been working at the Electronics Department of University of Alcalá since 2009 and he is currently pursuing the Ph.D. degree. His research areas are related to robotics sensorial systems and optical positioning systems.