Configuration errors analysis in SRAM-based FPGAs: Software tool and practical results

Microelectronics Reliability 47 (2007) 1836–1840 www.elsevier.com/locate/microrel Conﬁguration errors analysis in SRAM-based FPGAs: Software tool and...

Download PDF

230KB Sizes 0 Downloads 12 Views

Report

PDF Reader
Full Text

Microelectronics Reliability 47 (2007) 1836–1840 www.elsevier.com/locate/microrel

Conﬁguration errors analysis in SRAM-based FPGAs: Software tool and practical results a,*

V. Maingot a b

, J.B. Ferron a, R. Leveugle a, V. Pouget b, A. Douin

b

TIMA Laboratory, 46 Avenue Fe´lix Viallet, 38031 Grenoble Cedex, France IMS Laboratory, 351 Cours de la Libe´ration, 33402 Talence Cedex, France Received 9 July 2007 Available online 4 September 2007

Abstract The reconﬁgurability of SRAM-based FPGAs has also some drawbacks, especially when used in systems requiring a high level of safety and/or dependability. Dealing with single-event eﬀects is an important issue in these systems. This paper presents a software tool to analyze a bit-stream and the functional eﬀects of errors in it. Results of analyzes are presented, based on experiments using a laser platform to inject faults in the circuit. 2007 Elsevier Ltd. All rights reserved.

1. Introduction Due to the many advantages of the reconﬁgurability of SRAM-based FPGAs, their use is increasing even in systems requiring a high level of dependability (safety, availability, security, etc.). The main issue for such systems is their working conditions: they often have to operate under harsh environment, such as ionizing radiations, or they may have to resist to voluntary fault-based attacks, creating similar perturbations by using for example a laser. Single-event eﬀects (SEE) induced by the interaction of particles with integrated circuits are a well-known threat for space systems, which are directly exposed to cosmic rays. With the shrinking of the transistor sizes in modern technologies, systems are also sensitive to atmospheric particles at sea-level. The most probable eﬀect, when we consider SRAM-based FPGA at sealevel, is the singleevent upset (SEU), i.e. a bit-ﬂip in the embedded memories [1]. Faults in the conﬁguration memory of a SRAM-based FPGA directly modify the deﬁnition of its function, dangerously impacting its ability to operate properly [2]. These errors usually last until the conﬁguration memory *

Corresponding author. E-mail address: [email protected] (V. Maingot).

0026-2714/$ - see front matter 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.microrel.2007.07.074

is refreshed. Moreover, detecting and/or correcting these errors induce, in most cases, a high cost, that can still increase if multiple-bit upsets (MBUs) must also be considered. Protecting the system against faults in the conﬁguration memory is an important issue at design time. Several design-level solutions exist to develop fault-tolerant architectures from SRAM-based FPGAs. An example using the triple modular redundancy (TMR) technique is given in [3]. In all cases, the designer has to make a compromise between cost (area, power and performance overheads) and fault-tolerance. At design time, the evaluation of the eﬀects of faults in the device and the choice of the best protection strategy require realistic fault models. To achieve this, it is necessary to use results from actual fault injections on a test device to develop the fault models. The better the models are, the more accurate the evaluation of the dependability will be. An on-going collaborative eﬀort has allowed us to develop hardware and software tools and associated methodologies for performing pulsed laser fault injections in FPGA devices [4]. In this paper, we present the analysis of some results. We ﬁrst describe our tool soft error functional eﬀect analysis in programmable devices (SEFEAProD), developed to analyze the conﬁguration memory of

V. Maingot et al. / Microelectronics Reliability 47 (2007) 1836–1840

SRAM-based FPGAs and the eﬀects of fault injections, performed by simulations or in-silico experiments. Then, we detail some results obtained during a ﬁrst injection campaign. 2. Presentation of the analysis tool A software tool for the analysis of conﬁguration errors had been previously developed for the Xilinx Virtex I family with the JBits 2.8 software development kit (SDK) [5]. The JBits SDK is a set of Java classes, provided by Xilinx, that deﬁnes an application program interface (API) into Xilinx devices. We developed our own tool based on the JBits 3.0 SDK, for the analysis of the conﬁguration of any device from the Virtex II family and the comparison between erroneous conﬁgurations with golden ones. Input ﬁles can be a bitstream (.bit) ﬁle or a read-back data (.rbd) ﬁle, downloaded from the board. A graphical user interface (GUI) has been implemented in order to make the analysis easier, with diﬀerent views of the conﬁguration memory and of the FPGA architecture. Fig. 1 shows the hierarchy of these views: • Matrix tile view: The conﬁguration memory is presented as a tile array, showing the tiles used by the design and, for each one, the diﬀerent conﬁguration bits. The criticality of each conﬁguration bit is predicted with respect to its role and to the implemented design. • Matrix frame view: The same information is displayed but the conﬁguration bits are gathered with respect to their conﬁguration frame. • Schematic tile view: It shows the resources actually used in each CLB tile by the implemented application. The interconnections used and the conﬁguration (mode and content) of registers and look up tables (LUTs) are available. Additionally, this part of the developed tool can load several bit-streams at .bit format to detect error conﬁgurations due for example to pulsed laser fault injections. The visualization of the implemented architecture after each

Fig. 1. Hierarchical view of the tool.

1837

fault injection allows linking the laser energy and position to architectural eﬀects in the FPGA under test. Finally, our software tool is also able to compare diﬀerent bit-stream ﬁles and to report statistics on the eﬀects of fault injection campaigns [4]. A bitwise comparison and the study of the structure of the bitstream give us the list of faulted bits and their role in the bit-stream. Text reports generated by the tools are then processed with UNIX Shell scripts to achieve some statistical analyzes. Currently, some work is still required to precisely identify the role of some bits whose role is unknown. But the conﬁguration of most of the functional resources can be analyzed. 3. Experimental results We have conducted several laser fault injection campaigns on diﬀerent test conﬁgurations of a Virtex II xc2v1000 FPGA. A campaign of about 50 runs with multiple laser shots have been performed between the conﬁguration and the read-back of the device. The objective was to gather information on the largest amount of error patterns. With these data, we obtain both a statistical analysis and a detailed view of the eﬀects of the faults. One of our ﬁrst test conﬁguration has been designed to use all available CLBs in the FPGA – all ﬂip-ﬂops are connected and all LUTs implement a logic function – but no BRAM has been used. The results presented in the following few lines are based on this test setting. Table 1 presents the average numbers of faulted conﬁguration bits during this campaign for each type of elements, and their corresponding percentages. The elements are the following: input/output blocks and their interconnections (IOB, IOI), global clock (GCLK), RAM blocks and their interconnections (BRAM, BRAMI), CLB and conﬁguration bits of lateral IOB/IOI – contained in CLB frames – (CLBIO). This table is divided into three parts: the ﬁrst two lines are the ﬁgures considering every faulted bit, while the others are their equivalent when considering bits initially at zero (line 3–4) or one (line 5–6). We notice that most errors occur in CLB and BRAM frames, which are the most important elements in the chip and also because we have not focused our laser shots on I/ O pads. We also notice that the number of faulted zeros and ones are from the same order of magnitude, respectively, 53.09% and 46.91% of the number of faulted bits. This must be correlated with the area exposed to the laser. Table 2 shows the average number of conﬁguration bits per CLB in both the original bitstream and the faulted one. This density allows us to compare the diﬀerent probabilities to ﬂip a bit, depending on its value: the probability to ﬂip a one is found to be 2.5 times higher than the probability to ﬂip a zero. Since zero is the default value, this means, for example, that a hit should lead with a higher probability to the suppression of an interconnection rather than the creation of a new one. Table 3 presents the repartition of erroneous bits in CLB tiles. These bits are categorized in three groups: intercon-

1838

V. Maingot et al. / Microelectronics Reliability 47 (2007) 1836–1840

Table 1 Average repartition of faulted bits Element type

Total

CLB

CLBIO

GCLK

IOB

IOI

BRAM

BRAMI

Number of ‘0’, ‘1’ faulted Percentage

137.85 100.00

80.95 58.72

0.54 0.39

0 0

0.03 0.02

0.03 0.02

50.41 36.57

5.87 4.26

Number of ‘0’ faulted Percentage

73.18 53.09

17.10 12.40

0.54 0.39

0 0

0.03 0.02

0.03 0.02

50.41 36.57

5.05 3.66

Number of ‘1’ faulted Percentage

64.67 46.91

63.85 46.32

0 0

0 0

0 0

0 0

0 0

0.82 0.59

Table 2 Average number of bits per CLB Category

All bits

Bits at ‘1’

Bits at ‘0’

Golden bitstream Faulted bits Bit-ﬂip probability

1760 9.15 0.52%

212.80 2.37 1.11%

1547.20 6.78 0.44%

Table 3 Average repartition of faulted CLB bits Bit type

Total

Logic

Interco.

Unknown

Number Percentage

80.95 58.72

34.49 25.02

44.15 32.03

2.31 1.68

• No eﬀect: The link is maintained without any modiﬁcation. • Suppressed: The initial link is suppressed without the creation of any other connection. • Added: The initial link is maintained with the creation of extra connections. • Modiﬁed: The initial link is suppressed with the addition of extra connections. For an initial state with no connection, the bit-ﬂip may have no eﬀects (no eﬀect pattern) or may create new connections (created pattern). Table 4 shows that in 94% of the cases, the error patterns concern interconnections deﬁned by two bits. This is coherent with the percentage of such interconnections in the architecture (90.3%). But, we did not manage to ﬂip bits in the 0.2% of interconnections deﬁned by three bits. Of course, the sensitivity to bit-ﬂips highly depends on the number of bits required to determine the connection. Since several 1-bit connections can be conﬁgured by the same bit, the number of observed modiﬁcation patterns is higher than the number of bit-ﬂips in this case. Due to the complexity of the conﬁguration scheme of multiplebit connections, the opposite is observed in this case: the number of bit-ﬂips is higher than the number of modiﬁcation patterns. Connected wires

Unconnected wires

…

……

…

…

…

nection conﬁguration bits, logic conﬁguration bits (including LUTs and user memory bits) and currently unidentiﬁed bits. As expected, the largest contribution comes from bits conﬁguring the interconnections. The bits categorized as unknown are those that cannot be accessed by Jbits and their identiﬁcation is still ongoing. As previously mentioned, our software gives us the complete list of erroneous bits and their function in the conﬁguration of the FPGA. By analyzing the original bit-stream, we can understand the eﬀects of the laser shot on the implemented architecture. We here focus on some particular examples of error patterns. Most bit-ﬂips in logic conﬁguration modiﬁed LUTs and registers. For ﬂip-ﬂops, we identiﬁed the location in the bitstream of the user memory conﬁguration bits. The content of each ﬂip-ﬂop in a CLB is conﬁgured by one bit. Consequently, a bit-ﬂip is critical on these locations and it will lead to an error (and potentially a failure) at execution time. For LUTs, the truth tables are entirely included in the bit-stream; so an error will lead to a modiﬁcation of the logic function (as a four-input function), but the actual function may be preserved if all input patterns are not used. Interconnections in the device are conﬁgured in a heterogeneous way: they are not all conﬁgured by the same number of bits, that varies between one and three. Most of the interconnections are deﬁned by 2 bits (90.3%), while very few use three bits (0.2%). Bit-ﬂips in these conﬁguration bits can lead to diﬀerent modiﬁcation patterns, that are separated into two cases. For 1-bit connections, modiﬁcation patterns are from two types: the creation of a connection or its suppression. For multiple-bit connections,

modiﬁcation pattern are from six types, depending on the initial state of the interconnection and on the modiﬁcation. Fig. 2 illustrates these modiﬁcation patterns. A bit-ﬂip in bits initially connecting two wires leads to four patterns; the initial link can be maintained or not, with or without the addition of extra interconnections:

Modified

Suppressed

Added

: CLB interconnection

No effect

No effect

Created

: CLB wire

Fig. 2. Common interconnection modiﬁcation patterns (multiple-bit connections).

V. Maingot et al. / Microelectronics Reliability 47 (2007) 1836–1840 Table 4 Repartition of faulted bits in CLB interconnections Conﬁguration bits

One bit

Two bits

Three bits

Number of bit-ﬂips Modiﬁcation patterns

18 91

2290 1407

0 0

Table 5 illustrates an average classiﬁcation of the modiﬁcation patterns for 2-bit connections, over a full campaign. One important point is the possibility in many cases, for interconnections deﬁned by two bits, to maintain a correct connection structure in spite of the errors in the conﬁguration data. For initially unconnected wires, 86.12% of the modiﬁcations do not create perturbations. For initially connected wires, the initial wire is maintained in 50% of the cases (column ‘Added’) and the real functional consequences of the added connections on the node depend on the global interconnection resource usage for a given design. For further studies on these ﬁgures, we need to present the structure used for conﬁguring an interconnection. For 2-bit interconnections, the link is deﬁned between one resource and several sources. All conﬁguration bits are associated to the resource; each of these bits deﬁnes the list of sources reachable if the bit is activated. The resource is consequently connected to the source contained in the intersection between the two activated lists. To illustrate this, Fig. 3 shows an extract of the conﬁguration structure for the OMux9 resource; we have extracted ﬁve conﬁguration bits deﬁning, respectively, the following lists: {XQ0, XQ1}, {YQ0, YQ1}, {XQ0}, {XQ1, YQO} and {YQ1}. From an unconnected conﬁguration, creating a connection needs at least two bit-ﬂips. But since the intersection of two lists can be empty, two bit-ﬂips do not always lead to the creation of a connection. Indeed, the ‘Created’ modiﬁcation pattern needed in average three bit-ﬂips. It also explains why most of initially unconnected wires remain in their state, since the average number of bit-ﬂips per pattern is 1.6, which is below the threshold to have a chance to impact the connection. When considering an already established connection, the number of conﬁguration bits per resource is larger than 2 (9.1 in average). Consequently, the probability to suppress the connection (to ﬂip one of its two bits) is smaller than the probability to add a new connection. This trend is conﬁrmed by the ﬁgures shown in Table 5. However, the average number of bit-ﬂips needed is higher in the ‘Added’ case, due to the possibility of multiple creations.

1839

In the ‘Modiﬁed’ case, we have to both suppress the connection and create another one. In the optimal case, this needs two bit-ﬂips (one to disconnect and one to re-associate the resource with another source). This modiﬁed situation, as a combination of the two previous ones, is less frequent and requires more bit-ﬂips. The case ‘No eﬀect’ happens when ﬂipping a bit corresponding to no source that is present in the union of the lists of the two activated bits. Considering the multiplicity of faults in bits conﬁguring a resource, it is very improbable to maintain the original state of the interconnection. Furthermore, we studied the evolution of the average number of bit-ﬂips per pattern as a function of the number of created connections in the ‘Modiﬁed’, ‘Added’, and ‘Created’ situations. Results are shown in Fig. 4. To add one connection, one bit-ﬂip is usually enough in the ‘Added’ situation because there is already a connection in the initial state. On the contrary, the ‘Modiﬁed’ situation needs two bit-ﬂips in average (one to destroy the initial link and one to create a new connection). The ‘Created’ situation is similar because in the initial state no conﬁguration bit is activated, so two bit-ﬂips are needed to make one connection. Our experiments have shown that from this situation, adding one bit-ﬂip on a resource tend to create a new connection. When increasing the number of created connections, the number of bit-ﬂips necessary to add a new connection seems to decrease, linked to the increasing probability to ﬁnd a source in an already activated list. We did not use the common interconnection error patterns introduced in [6] since our goal was to identify possible modiﬁcation patterns over one and only one wire

B1 B2

B3 B4 B5

XQ0

XQ XQ1 OMUX 9 YQ0

YQ YQ1

Fig. 3. Partial view of the conﬁguration structure of a 2-bit interconnection.

Table 5 Classiﬁcation of modiﬁcation patterns in interconnections deﬁned by two bits Initial state

Connected

Unconnected

Eﬀect on connection

Modiﬁed

Suppressed

Added

No eﬀect

No eﬀect

Created

Average number of bit-ﬂips Average number of modiﬁcation patterns Percentage Average number of bit-ﬂips per pattern

16.5 7.1 0.5 2.3

30.7 20.4 1.5 1.5

49 29 2.1 1.7

0 0 0 n/a

1613 1163.1 82.7 1.4

581 187.4 13.3 3.1

V. Maingot et al. / Microelectronics Reliability 47 (2007) 1836–1840 Average number of bitflips per patterns

1840 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0

Modified Added Created

1

2 3 4 Number of created connections

Fig. 4. Average number of bit-ﬂips in the ‘Modiﬁed’, ‘Added’, and ‘Created’ pattern.

resource within a CLB. We were able to classify all possible modiﬁcation patterns. The next step of our study will be to determine the probability of each modiﬁcation pattern following a single laser shot and to study the criticality of faults with respect to the implementation of the design. Then, we will be able to use the nomenclature in [6] by studying not only one wire resource but also the situation of the neighbour interconnection resources. We will then be able to make a correspondence between for instance our ‘Added’ situation and the ‘short’ or the ‘bridge’ pattern previously deﬁned. 4. Few lessons from this experiment These analyzes aim at giving detailed information on the eﬀects of laser fault injection on a SRAM-based FPGA. These data can be used to both elaborate a fault model and emphasize the criticality of diﬀerent parts of the component. At this point of the study, after the analysis of results obtained during campaigns based on multiple laser shots, we are able to classify the induced errors. The development of a precise fault model will require the results of the next campaign, based on single laser shots and their analysis. Error patterns obtained in such conditions will be used in emulation- or simulation-based fault injection campaigns to evaluate the robustness of designs to conﬁguration errors. But another aspect of this study is to reveal the sensitivity of the diﬀerent elements on the chip. With this cartography of the criticality, a designer can choose which resources inside the chip must be used in priority. From the data presented in this paper, we can draw some recommendations on the use of resources inside the CLB, in particular for LUTs and interconnections. As each error in 4-input LUT conﬁguration will probably generate a failure, complex logic functions may have to be distributed over several LUTs to increase their robustness. Of course, the impact on the global design will have to be evaluated in further work. For interconnections, on which we have focused our current work, the number of needed conﬁguration bits to

create a connection has to be taken in account. 1-bit connections have to be avoided when possible. Moreover, since in most cases the initial link can be maintained when initially connected (56%), the addition of connections to the initial wire resource is the main source of application failure. So the local density of interconnections should be kept low, and used resources have to be distributed over the maximum number of CLBs. 5. Conclusion and perspectives We have brieﬂy presented a software tool developed to analyze the conﬁguration memory of SRAM-based FPGAs from the Virtex II family. The link with the architectural components allows us to study the eﬀects of conﬁguration errors from a designer point of view. Experimental results point out sensitive elements in the FPGA and the higher probability to ﬂip a 1 (i.e. an activated bit) than a 0. Error patterns have also been discussed and new directions towards robust design on SRAM-based FPGAs have been outlined. Future work will ﬁrst focus on the eﬀect of single laser shots. Modiﬁcation patterns obtained in this case will provide a good model for simulated or emulated fault injections. This should allow better evaluating the robustness of a design before using any laser facility. Also, the protection techniques outlined here will have to be implemented and evaluated. Acknowledgements This work is partly supported by the French Ministry of Research, through the project ACI-SI VENUS. The authors thank all TIMA and IMS people having contributed to the experiments whose results are analyzed in this paper. References [1] Alderighi M, Candelori A, Casini F, D’Angelo S, Mancini M, Paccagnella A, et al. SEU sensitivity of virtex conﬁguration logic. IEEE T Nucl Sci 2005;52(6):2462–7. [2] Morgan K, Caﬀrey M, Graham P, Johnson E, Pratt B, Wirthlin M. SEU-induced persistent error propagation in FPGAs. IEEE T Nucl Sci 2005;52(6):2438–45. [3] Kastensmidt FL, Sterpone L, Carro L, Reorda MS. On the optimal design of triple modular redundancy logic for SRAM-based FPGAs. In: Proceedings of design, automation and test in Europe (DATE) 2005, vol. 2; 2005. p. 1290–5. [4] Pouget V et al. Tools and methodology development for pulsed laser fault injection in SRAM-based FPGAs. In: 8th Latin-American test workshop (LATW), March 12–14, 2007. [5] Kinzel Filho C, Lima Kastensmidt F, Carro L. Improving reliability of SRAM-based FPGAs by inserting redundant routing. IEEE T Nucl Sci 2006;53(4):2060–8. [6] Sonza Reorda M, Sterpone L, Violente M. Eﬃcient estimation of SEU eﬀects in SRAM-based FPGAs. In: Proceedings of international online testing symposium (IOLTS); 2005. p. 54–9.

Configuration errors analysis in SRAM-based FPGAs: Software tool and practical results

Configuration errors analysis in SRAM-based FPGAs: Software tool and practical results

Recommend Documents