Permanent and single event transient faults reliability evaluation EDA tool

Permanent and single event transient faults reliability evaluation EDA tool

MR-12095; No of Pages 5 Microelectronics Reliability xxx (2016) xxx–xxx Contents lists available at ScienceDirect Microelectronics Reliability journ...

767KB Sizes 0 Downloads 27 Views

MR-12095; No of Pages 5 Microelectronics Reliability xxx (2016) xxx–xxx

Contents lists available at ScienceDirect

Microelectronics Reliability journal homepage: www.elsevier.com/locate/mr

Permanent and single event transient faults reliability evaluation EDA tool Y.Q. de Aguiar a,b,⁎, A.L. Zimpeck b, C. Meinhardt a,b, R. Reis b a b

Centro de Ciências Computacionais, Universidade Federal do Rio Grande, Rio Grande, RS, Brazil Institutode Informática, PPGC/PGMicro, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil

a r t i c l e

i n f o

Article history: Received 1 July 2016 Accepted 8 July 2016 Available online xxxx Keywords: Stuck-On Stuck-Open Single Event Transient EDA

a b s t r a c t In nanotechnology domain, reliability is a fundamental concern in the design and manufacturing process of VLSI circuits. Thus, this paper presents a tool developed to evaluate the reliability of logic cells in order to provide a set of information to improve design robustness. The tool is able to evaluate logic cells under Single Event Transient (SET) faults and, also, permanent faults such as Stuck-On (SOnF) and Stuck-Open (SOF). The information produced by this tool help designers to choose the most reliable cells to be adopted in their designs. © 2016 Elsevier Ltd. All rights reserved.

1. Nanotechnology reliability Technology scaling introduced new challenges in circuit design due to the tiny dimensions and process variability. Besides the dimension shrinking, the supply voltage is reduced to meet the design requirements such as reducing the consumption of dynamic and static power. In nanotechnologies, there is an increase on leakage currents featuring an increase in static consumption of logic gates [1]. Furthermore, the scaling process has a direct and negative impact on reliability [2]. At each new technology node, there is a significant increase in the number of possible faults, reflecting high device failure rates and low yield [3]. Advances in microelectronics have led the scale down of technology and reduction in the threshold voltage as well as the increase of operating frequency. However, it causes an increase in the susceptibility of the circuit relative to the noise from the environment and particularly the bombardment of particles of radiation [4]. Even particles with low energy that reach the Earth surface, previously overlooked, they are now able to interfere within the operation of a circuit. Thus, one of the biggest challenges in the semiconductor industry is to ensure the reliability of circuits due to the impact of ionizing particles into devices. For a long time, Single Event Transient (SET) was considered irrelevant due to intrinsic capacity of the combinational cells to mask their effect. However, in each new generation of technology, the effects of masking have been reduced, increasing the need to study and develop SET mitigation techniques [5].

⁎ Corresponding author at: Centro de Ciências Computacionais, Universidade Federal do Rio Grande, Rio Grande, RS, Brazil. E-mail address: [email protected] (Y.Q. Aguiar).

Due to random dopant fluctuation and sub-wavelength lithography technology, the manufacturing process of VLSI circuits has to deal with a certain degree of uncertainty [2]. It is the main reason to cause variability and the possible occurrence of permanent faults, such as Stuck-On (SOnF) and Stuck-Open (SOF) faults that happens at transistor level. SOnF fault conducts permanently, regardless of the voltage applied to its gate terminal. Unlike SOnF, in the SOF, transistors behave as an open switch permanently, no matter what signal is applied to its gate terminal [6]. Owning these features, these faults have been of extreme importance to be avoided in design process because it is directly related to the dynamic and static power consumption increase in current technologies [1]. In order to design reliable systems out of unreliable devices, it is imperative to do fault modeling and the adoption of EDA tools, which allow the assessment of circuits regarding the reliability. In this work, a tool was developed to evaluate the reliability of circuits under three different and main types of faults found in combinational cells at nanotechnologies: SET, SOnF and SOF faults. The developed tool aims to provide a complete set of information about the behavior of combinational cells under these faults and it helps designers to easily assess the robustness impact of different fault tolerance techniques applied to the device under evaluation. 2. Fault modeling At circuit level, real defects are too numerous and often not analyzable. Fault models are adopted to help identify targets for testing, modeling the faults most likely to occur, allowing to create test only for the modeled faults. A fault model makes analysis possible associating specific defects with specific test patterns. In addition, fault models ensure effectiveness measurable by experiments, i.e., fault coverage can be

http://dx.doi.org/10.1016/j.microrel.2016.07.072 0026-2714/© 2016 Elsevier Ltd. All rights reserved.

Please cite this article as: Y.Q. Aguiar, et al., Permanent and single event transient faults reliability evaluation EDA tool, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.07.072

2

Y.Q. Aguiar et al. / Microelectronics Reliability xxx (2016) xxx–xxx

determined for specific test patterns to reflect its effectiveness. The most common fault models are single stuck-at faults, transistor open and short faults, memory faults, functional faults (affecting processors, for example), delay faults and analog faults (mainly due to parametric deviation). Fault models are analyzable approximations of defects and are essential for a test methodology. In this work, the tool developed explores the transistor open/short fault model (permanent faults) and the events caused by energetic particles in combinational circuits. The specifications and possible applications of these models are explained in more details below. 2.1. Permanent faults A transistor in normal operation makes the connection between two circuit nodes according to the applied signal at the gate terminal. When a permanent fault occurs in a component, it will never return to function properly. There are two main permanent faults that normally happens in MOS transistors: Stuck-Open (SOF) and Stuck-On (SOnF). If a SOF happens, it indicates that the connection between two nodes will never occurs, i.e., a charge is prevented from flowing through the transistor when it is activated leading to a high impedance output state (Z). SOFs are more complex to be tested because the output depends on the energy load at the output capacitance in previous state. Its detection requires a specific 2-vector pair that examines each transistor in the logic gate for an open defect in device structure. One important term to measure the SOFs effects in nanotechnologies is the holding time. The holding time is a measure related to the delay induced by the SOF, i.e., the time for output-node discharge from VDD to VDD-|Vtp|, where |Vtp| is the PMOS transistor threshold voltage. If we had a charging, the charging time for output-node would be from GND to |Vtn|, where |Vtn| is the threshold voltage of NMOS transistor. Giving this definition and observing the SOF behavior on nanotechnologies described in Fig.1, we can see that the holding time measured as [6] is a very short time compared to the time need to the next logic stage change his state. The new methodology proposed in [7] for the measurement of the holding time recommends to measure the holding time as the time need to the next logic stage change its state after a transition on the fault circuit. By this methodology, the holding time is now the time for the output discharges from VDD to VTRANSITION, where VTRANSITION is the voltage required to the next logic stage understands the output signal as an inversion on the logic level. We are assuming VTRANSITION as VDD/2. With this new approach the holding time increases, i.e., increases the time in that output is considered to remain in previous state. It increases the time that the tester should have to detect SOF in static CMOS circuits. If a Stuck-On occurs in a transistor, it indicates that the connection between two nodes happens permanently, independent of signal

Fig. 1. Obtained result for a logic gate with SOF in different technologies [7].

applied in the gate terminal. This fault normally generates a short circuit allowing that the pull-up and pull-down network leads together. The short circuit provokes excessive power consumption and it can leave an unexpected result. Different of Stuck-Open faults, SOnFs does not depend on the previous circuit condition, requiring only the current test vector. As this fault does not generate delays, independent of circuit clock frequency, a fault appears and it can be analyzed. 2.2. Transient faults The effects related to single radiation particle ionizing into the silicon are named as Single Event Effects (SEE) [4]. It can be classified into destructive, when the particle is able to damage the device permanently, or non-destructive effects, when the device undergoes transient faults. The most well-known transient faults due to single events are the Single Event Upset (SEU), when the transient pulse generation is carried at sequential circuits, and Single Event Transient (SET), at combinational circuits. The transient pulse is caused when an overcharge is generated due to the incidence of an ionized particle in a PN junction of the device and if the collected charge Qcoll exceeds a critical charge Qcrit, which is technology dependent [4]. As aforementioned, the importance of studying the effects of SET is increasingly imperative in order to design reliable circuits at nanoscale [5]. Hence, the SET fault was modeled into the simulation tool. It was adopted an analytical model proposed by [8]. It is widely used and proposes a double exponential current source behavior to emulate the induced transient defined by Eq. (1) and Eq. (2),where I0 is the current related to Qcoll (collected charge at the junction, in femto coulomb), ταis the charge collection time constant, and τβ is the time constant for establishing the Ion path.  −t −t  Iðt Þ ¼ I 0 eτα −eτβ

ð1Þ

Q coll ¼ 10:8  L  LET

ð2Þ

Some values are typical when used for simulations and experiments on silicon as for τα = 1.64 × 10−10 s and 5 × 10− 11 s to τβ [9]. The charge collection depth L (in micron) is typically 2 μm in bulk silicon and LET (Linear Energy Transfer, in MeV-cm2/mg) corresponds to the amount of energy released by a particle per unit length crossed. 3. Fault analysis simulation tool The developed tool models the SOF, SOnF and SET effects in combinational logical gates at electrical level. The tool provides a diverse set of features to the user in order to become possible the analysis of a wide range of parameters resulting in a great amount of data from several simulations. It is developed using Java due to the high portability. Circuit descriptions are carried through SPICE netlist and the simulation-based fault injection is carried out by an electrical simulator. Fig. 2 summarizes the robustness evaluation flow. For modeling permanent faults, the SOnF affected transistor is held conducting permanently. When SOFs are injected, the damaged transistor is stuck open permanently, avoiding any possibility of conducting. In the case of transient faults, the SET is modeled by a double exponential current source on the spot of the incident particle [8]. For this purpose, the tool performs an identification procedure generating a list of each node of the analyzed circuit, which can be possibly affected by a radiation-induced glitch, and then the current source is applied to. Each SET, SOF or SOnF effects are different depending on several aspects, as device technology and supply voltage, operating frequency and output load [4]. SET faults are also dependent on the particle strike location. To enable the exploitation of all impact factors on the SET, SOF or SOnF effects, the tool provides a set of configuration parameters to be setup at each new simulation, as the type of fault to be simulated,

Please cite this article as: Y.Q. Aguiar, et al., Permanent and single event transient faults reliability evaluation EDA tool, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.07.072

Y.Q. Aguiar et al. / Microelectronics Reliability xxx (2016) xxx–xxx

Fig. 2. Function block diagram.

operating frequency, technology node, supply voltage, output load and the device under evaluation description in a SPICE format. In SET experiments, it is also possible to setup the amount of deposited energy by defining the LET in MeV-cm2/mg as well as the parameters to determine the transient pulsewidth. To evaluate electrical masking, it is optional to define the number of levels of propagation through a chain of inverters. The tool automatically identifies all SET sensitive nodes, considering all PN-junction of devices in off-state. For permanent faults, the tool allows exhaustive and selective test. In exhaustive test, all devices are considered faulty devices, one at a time and all input vectors are considered. Due to previous state dependence of SOF, all pair of vectors are determined by Eq. (3), where E is the number of inputs of the analyzed logic cell, nis the exponential and p is always equal to 2, because this fault always considers a pair of test vectors. Asðn; pÞ ¼

n! ðn−pÞ!

As ðn; pÞ ¼

n! ðn−pÞ!

ð3Þ

Moreover, with this tool it is possible to determine the coverage fault of logic cell under evaluation, contributing to the identification of sensitive regions of circuits and helping to develop more robust and effective fault tolerance techniques. Furthermore, at nanotechnology, the behavior of SOF has changed due to high leakage currents and reduction of signal node capacitance. This change can be precisely studied using the tool that it is able to determine the holding time, for example. As previously mentioned, it is highly important to study the behavior of holding time because it is a delay induced by SOF in the switching of the output signal which can lead to an incorrect read of the logic level output performed by the cell.

3

connected to the output node of the cells. Nonetheless, the tool allows the exploration of the electrical masking due to electrical attenuation at every each node of the circuit. For HA_X1 were injected 22 faults, in AOI21_X1 were injected 26 faults and 12 faults on NAND2_X1cell. In the first analysis, the parameters considered for simulation of SET were: LET = 1 MeV-cm2/mg, rise time τR equal to 10 ps and fall time τF equal to 320 ps. All cells were analyzed using 45 nm and 16 nm predictive technology nodes for high performance (HP) [11]. The propagation of transient glitches is also related to the circuit clock frequency, thus three different operating frequencies were explored in the simulations, frequencies of 500 MHz, 1 GHz and 2 GHz. The transient pulse, besides the process related parameters, also depends on the amount of energy which can be deposited by the ionized particle, though it was also conducted an analysis taking into consideration the LET of the incident particle. The results are organized according to the parameter explored in the tool. Firstly, the influence of operating frequency is showed. The second analysis evaluates the influence of the technology in the SET sensibility of the circuits. The third experiment presents an analysis of different particle collision energies is explored, varying the LET in the simulation tool. Finally, the tool also allows evaluating the influence of the input vector in the cell sensibility.

4.1. Analysis of the influence of operating frequency Fig. 3 shows the percentage of propagated faults for each standard cell analyzed considering LET = 1 MeV-cm2/mg and 45 nm HP technology at different frequencies. HA_X1 is the least reliable at high frequencies with 54.55% and 45.45% of propagated faults for 2 GHz and 1 GHz, respectively. The AOI21_X1 cell has shown similar percentages at low frequencies with 19.23% and 15.38% of propagated faults for 1 GHz and 500 MHz, respectively. The 2-input NAND gate, although its simplicity it has shown a different behavior compared to other analyzed cells. At frequency of 500 MHz no faults were propagated to the next level. The least reliable situation occurs when operating at 1 GHz instead of 2 GHz as expected. The analysis considering an increased amount of energy deposited by the particle can be seen at Fig. 4, when LET = 3 MeV-cm2/mg. As expected, the average percentage of propagated faults increased at the three architectures due to the increased energy deposited. The AOI21_X1 cell had very close percentages at all frequencies, but the architecture is most reliable at 1 GHz with 61.54% and least reliable at 2 GHz with 69.23%. The 2-input NAND gate has shown 66.67% of propagated faults at 2 GHz and 1 GHz. An interesting feature noticed is that the frequency has not a strong impact for higher LET resulting into the same or very closely percentages of propagated faults for all three cells.

4. Experiments In order to show the potential application of the developed tool, in this Section a sort of experiments are presented to evaluate the SET effects in combinational circuits. Three logic cells from a 45 nm commercial Standard Cell library have been analyzed: a half-adder HA_X1, a complex logic gate AOI21_X1 and a 2-input NAND gate, NAND2_X1. For these experiments, the fault injection ensures the evaluation of two SETs for each circuit time slot of simulation, considering the timing arches of the function. The output node characterizes as the worst scenario in which a particle may collide, because transients induced in internal nodes suffer electrical attenuation [10]. The experiments simulate a SET fault affecting the drain/source region of the devices

Fig. 3. Analysis of the influence of operating frequency to LET = 1 MeV-cm2/mg and 45 nm HP technology.

Please cite this article as: Y.Q. Aguiar, et al., Permanent and single event transient faults reliability evaluation EDA tool, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.07.072

4

Y.Q. Aguiar et al. / Microelectronics Reliability xxx (2016) xxx–xxx

Fig. 4. Analysis of the influence of operating frequency to LET = 3 MeV-cm2/mg and 45 nm HP technology.

4.2. Comparing the influence of different technology The Fig. 5 contains the results obtained for the simulation considering LET = 1 MeV-cm2/mg and clock frequency of 2 GHz at 45 nm and 16 nm HP technology nodes. The half adder HA_X1 suffered an increase of 18.18% when using the 16 nm HP technology. The complex gate AOI21_X1 has proved to be robust to the influence of the change between the two adopted technologies resulting to the same percentage of propagated faults. Whilst for the 2-input NAND gate has proved to be strongly susceptible to the variation of the technology with 8,33% and 41.67% of propagated faults for 45 nm and 16 nm HP technology, respectively.

4.3. Analysis of different linear energy transfer(LET) Each induced SET has its own characteristics as amplitude, transient duration and the total energy deposited into the silicon [12]. Accordingly, the user of the developed tool can explore the parameters responsible for determining these characteristics. The LET is one of these parameters and it is explored in this subsection. Fig. 6 presents the results obtained for LET = 1 MeV-cm2/mg and LET = 3 MeV-cm2/mg when the evaluated circuits are operating at 2 GHz using 16 nm HP technology. A greater amount of deposited energy led to higher percentage of propagated faults. The half adder appeared to be the most robust to this variation with the difference of 9.1% between the evaluated LET. For the AOI21_X1 and

Fig. 6. Analysis of the different LET for 16 nm HP technology at 2 GHz.

NAND2_X1 cell with 38.46% and 25% of difference in the propagated faults. 4.4. Reliability analysis in function of input vectors The obtained data from the evaluation tool can still be used to the analysis of reliability considering the input vectors of the circuit when the particle hits. Due to the limited space in the paper, only the analysis for the half adder HA_X1 is shown in the Table 1 and Table 2. These tables summarize all results obtained for the half adder indicating the number of injected and propagated faults for each input vector. With this information, the designer can identify the most susceptible input vector for the architecture and apply tolerant fault techniques in a more effective manner. For HA_X1 cell at 16 nm at 1 GHz the most sensitive input vector is (A = 1, B = 1) with 3 from 4 injected faults propagated when LET = 1 MeV-cm2/mg and all faults are propagated when LET = 3 MeV-cm2/mg. At 2 GHz, the input vector (A = 0, B = 0) is the most sensitive vector together with the vector (A = 1, B = 0). The same analysis can be conducted with the data from Table 2, which the 45 nm HP technology is evaluated. 5. Conclusions Based upon the increased concern on reliability of designs, reliability-centric EDA tools are becoming mandatory to analyze the vulnerability of circuits at early design stages [2,3,5]. This work presents a tool to evaluate the robustness of logic circuits under three different fault models: Stuck-On, Stuck-Open and Single Event Transient faults. The implemented tool allows an extensive exploration of these fault model parameters, helping designers of combinational cells to identify the reliability of the investigated cell under determinate conditions. The usage of the developed tool is highly valuable to analyze the behavior of designed circuits under faults, as well as to elucidate the direct influence of design parameters to each fault separately. With the propagation fault information generated by this tool, designers of standard cells can choose the most appropriated circuit to a reliable library. Table 1 The relation of propagated and injected for HA_X1 at 16 nm High Performance considering its input vector/.

Fig. 5. Analysis of the influence of technology node for LET = 1 MeV-cm2/mg and clock frequency of 2 GHz.

Input Vector (AB) 00 01 10 11

f = 500 MHz

f = 1 GHz

LET 1 2/6 2/6 2/6 4/4

LET 1 2/6 3/6 3/6 3/4

LET 3 6/6 2/6 2/6 4/4

f = 2 GHz LET 3 6/6 4/6 4/6 4/4

LET 1 5/6 4/6 5/6 2/4

LET 3 6/6 4/6 6/6 2/4

Please cite this article as: Y.Q. Aguiar, et al., Permanent and single event transient faults reliability evaluation EDA tool, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.07.072

Y.Q. Aguiar et al. / Microelectronics Reliability xxx (2016) xxx–xxx Table 2 The relation of propagated and injected for HA_X1 at 45 nm High Performance considering its input vector.

Input Vector (AB) 00 01 10 11

f = 500 MHz

f = 1 GHz

LET 1 1/6 1/6 0/6 1/4

LET 1 0/6 4/6 2/6 4/4

LET 3 5/6 2/6 2/6 4/4

f = 2 GHz LET 3 6/6 3/6 3/6 4/4

LET 1 2/6 3/6 5/6 2/4

LET 3 6/6 4/6 6/6 2/4

Moreover, this information helps designers to decide where and how to apply fault tolerance techniques. During the individual cell evaluation, this tool allows to test a huge set of possible scenarios, mainly considering SET effects, and to evaluate the efficacy of design hardening methodologies [13]. Acknowledgment This work was supported by Brazilian National Council for Scientific and Technological Development (CNPq) and Coordination for the Improvement of Higher Education Personnel (CAPES) CAPES.

5

References [1] C. Meinhardt and R. Reis Evaluation of Process Variability on Current for Nanotechnologies Devices.http://dx.doi.org/10.1109/LASCAS.2012.6180361. [2] S. Borkar Designing Reliable Systems from Unreliable Components: the Challenges of Transistor Variability and Degradation http://dx.doi.org/10.1109/MM.2005.110 [3] L. Anghel, and Nicolaidis. Defects Tolerant Logic Gates for Unreliable Future Nanotechnologies.http://dx.doi.org/10.1007/978-3-540-73007-1_52 [4] R. C. Baumann Radiation-Induced Soft Errors in Advanced Semiconductor Technologies http://dx.doi.org/10.1109/TDMR.2005.853449 [5] T. Rejimon, Reliability-Centric Probabilistic Analysis of VLSI CircuitsPhD Thesis USF, 2006. [6] R. Gomez, et al. A Modern Look at the CMOS Stuck-Open Fault. http://dx.doi.org/10. 1109/LATW.2009.4813818. [7] A.L. Zimpeck, et al., A new methodology to evaluate the holding time in CMOS logic gates with stuck-open fault, Chip in Curitiba - SForum, 2013. [8] G. C. Messenger, Collection of Charge on Junction Nodes from Ion Trackshttp://dx. doi.org/10.1109/TNS.1982.4336490 [9] V.A. Carreno, G. Choi, R.K. Iyer, Analog-digital simulation of transient-induced logic errors and upset susceptibility of an advanced control system, NASA Tech. Memo. 4241 (1990). [10] Mohanram, K. Simulation of Transients Caused by Single-Event Upsets in Combinational Logic. http://dx.doi.org/10.1109/TEST.2005.1584063 [11] PTM, predictive technology model. Available at bhttp://ptm.asu.edu/N) [12] Wang, F., Vishwani D. Agrawal. Single Event Upset: an embedded tutorial. http://dx. doi.org/10.1109/VLSI.2008.28 [13] Faccio, F. Design Hardening Methodologies for ASICs. 10.1007/978-1-4020-5646-8_7.

Please cite this article as: Y.Q. Aguiar, et al., Permanent and single event transient faults reliability evaluation EDA tool, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.07.072