Microelectronics Reliability 52 (2012) 1515–1522
Contents lists available at SciVerse ScienceDirect
Microelectronics Reliability journal homepage: www.elsevier.com/locate/microrel
Analysis of intermittent timing fault vulnerability Saurabh Kothawade, Koushik Chakraborty ⇑, Sanghamitra Roy, Yiding Han Electrical and Computer Engineering, Utah State University, 4120 Old Main Hill, Logan UT 84322, United States
a r t i c l e
i n f o
Article history: Received 18 November 2011 Received in revised form 13 February 2012 Accepted 8 March 2012 Available online 25 April 2012
a b s t r a c t Continuous scaling of transistor feature size rapidly increases the effect of intermittent faults. These faults manifest as timing violations due to the combined effects of process variation, circuit wear-out, and variation in environmental conditions. In this paper, we combine all critical sources of intermittent faults in a comprehensive framework. Our experiments with the MIPS-789 processor reveal that at the 22nm technology node, the combined effect of all the factors can degrade the delay by 2.5X. Such gross delay degradation extending more than two cycles can render many recently proposed time borrowing techniques ineffective. We analyze three architectural techniques to mitigate intermittent faults and evaluate them using full system architectural simulation. Ó 2012 Elsevier Ltd. All rights reserved.
1. Introduction Continuous scaling of transistor feature size has substantially increased the challenges of designing robust microprocessor systems. The prospect of a complete design overhaul is becoming a distinct possibility, impacting circuit designers and system architects alike. One of the recent challenges in this design spectrum is the growing importance of intermittent timing faults, which have historically received lower attention than transient (soft errors) and permanent faults [1,2]. As shown in Fig. 1, intermittent timing faults represent a class of timing violations that appear sporadically. Typically, these faults occur in bursts, lasting several cycles, but tend to disappear after a while. Several technology trends, spanning from device level characteristics to system level workload execution, conspire to cause intermittent faults. First, with the increasing manufacturing process variation, the delay of an integrated circuit becomes nondeterministic, behaving more like a statistical distribution [3,4]. Second, transistor aging mechanisms, such as Negative Bias Temperature Instability (NBTI), and Hot Carrier Injection (HCI) cause the performance to degrade over time, further exacerbating the uncertainty in circuit delay estimation [5,6]. Third, fluctuations in runtime operating conditions such as voltage and temperature, primarily induced by workload executions, introduce additional degrees of variation in the circuit performance. Combined together, these factors can cause alarmingly high levels of degradation in circuit delays. In this paper, we propose a comprehensive framework to combine the effects of process variation, transistor aging, thermal and voltage fluctuations to analyze the variation in circuit performance. ⇑ Corresponding author. E-mail address:
[email protected] (K. Chakraborty). 0026-2714/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.microrel.2012.03.003
While several previous works study the delay degradation problem due to isolated sources, to the best of our knowledge, this is the first work to unify all known critical sources of delay degradation in an integrated circuit (e.g., some work ignores NBTI and HCI [7,8], or PVT (Process Voltage and Temperature) variation [9]). Our experiments with the MIPS-789 processor reveal that intermittent timing violations become a critical design challenge in the forthcoming technology generations. For example, at the 22nm technology node, the combined effect of all the factors mentioned above can degrade the delay by 2.5X. Many recently proposed circuit techniques based on time borrowing become ineffective under such gross delay degradation, as the computation delay in a single stage may extend to more than two cycles [9,10]. In the light of this trend of multi-cycle delay degradation, we believe that architectural techniques must be combined with circuit level techniques to mitigate intermittent timing violations. We perform a trade-off analysis of three architecture techniques to mitigate intermittent timing violations: frequency scaling, dynamically altering pipeline stage latency and thread migration. We find that frequency scaling is most effective against purely aging induced degradation, while thread migration demonstrates least performance loss, but requires an idle core to run code. Few recent works discuss intermittent faults. Pan et al. [1] classify intermittent faults in three categories (stuck-at-zero/stuck-atone, intermittent short or open, timing), but focus their attention only on the first kind. Their work extend Architectural Vulnerability Factor (AVF) computation techniques to estimate the vulnerability to stuck-at-zero or stuck-at-one type intermittent faults. Wells et al. propose using a thin virtual-machine layer to perform thread migration to adapt to intermittent faults [11]. A recent work also characterized the impact of intermittent faults on a real system [12]. We make several contributions in the area of intermittent fault analysis in modern pipelined microprocessors.
1516
S. Kothawade et al. / Microelectronics Reliability 52 (2012) 1515–1522
Subsequently, we perform statistical timing analysis, using the SPICE characterized gate delay distributions, to estimate the range of intermittent timing characteristics. Our framework accurately combines the interplay of all known critical sources in intermittent timing violations. We now describe different components of our framework in details. In addition, we also present a brief description of other aging factors such as Time Dependent Dielectric Breakdown (TDDB) and Electromigration, and their impact on intermittent timing faults (Section 2.5). Fig. 1. Hardware fault model.
2.1. Process variation We combine the effects of process variation, transistor aging and temperature and voltage fluctuations in a comprehensive framework using extensive SPICE characterization coupled with statistical timing analysis (Sections 2 and 3). We use the five stage MIPS-789 [13] pipelined processor from opencores.org to demonstrate that the impact of intermittent timing faults is rapidly growing with technology scaling. Under certain conditions, intermittent timing faults can generate multi-cycle delay degradation in pipeline stages rendering time borrowing techniques ineffective. We perform trade-off analysis of three different architecture techniques using full system simulation infrastructure built on Virtutech SIMICS (Section 4). 2. Combining process variation, NBTI and HCI aging and runtime voltage and temperature fluctuation In this section, we describe our comprehensive framework to unify all known critical sources of intermittent timing violations. An overview of our framework is presented in Fig. 2. Using SPICE Monte Carlo simulations, for different technology nodes, we characterize the gate delay distributions in a library accounting for: (a) NBTI degradation using long term threshold voltage change, (b) HCI degradation using failure equivalent circuit model (c) process variation and (d) temperature and voltage fluctuations. A given circuit is synthesized using the gates in our characterized library.
A key challenge in modern digital circuit design is to deal with manufacturing process variations (PV). Shrinking feature size with technology scaling greatly exacerbates this problem, as the precision demand is beginning to exceed the capabilities of modern fabrication equipments. As a direct outcome, the device characteristics (e.g., channel length, width, doping concentration, oxide thickness) can deviate from the intended design point and need to be considered as random variables. Consequently, delay of logic gates will be a distribution of the random variables rather than a deterministic delay: a fundamental change in the underlying assumption of digital circuit design [3,4,14]. Fig. 3 compares the statistical timing analysis with the traditional static timing analysis. For the ease of comprehension, we show the timing analysis step performed in a single logic gate. Under static timing analysis, the resultant delay is given by:
a3 ¼ maxða1 þ D13 ; a2 þ D23 Þ
ð1Þ
where, ai is the arrival time of the signal at node i, and Dij is the gate delay from node i to node j. In statistical timing analysis, each of the delay components is essentially a distribution (typically a Gaussian) with mean and standard deviation. Thus, the mean and standard deviation of the output delay (la3 and ra3 ) are given by:
ðla3 ; ra3 Þ ¼ SMAXðSSUMððla1 ; ra1 Þ; ðlD13 ; rD13 ÞÞ; SSUMððla2 ; ra2 Þ; ðlD23 ; rD23 ÞÞÞ
Fig. 2. Framework to estimate intermittent timing violations.
ð2Þ
1517
S. Kothawade et al. / Microelectronics Reliability 52 (2012) 1515–1522
on the relative lengths of the stress and the recovery periods. Overall, NBTI wear-out degrades the circuit performance over time. We can quantitatively measure this performance degradation in a given circuit by estimating the effective increase in threshold voltage of individual gates in the circuit. In this work, we are interested in the long term degradation of the circuit performance: as evaluating the impact of individual stress and recovery periods is impractical. Given a duty cycle of transistor aging (a), denoting the fraction of the total time spent in the stress mode, Bhardwaj et al. provide a closed form expression for the upper bound of the long term threshold voltage change (DVt) [5]:
0qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi12n K 2v aT clk A DV t ¼ @ 1 b1=2n t
ð7Þ
We consider differential degradation within the circuit due to topological differences by propagating Zero Bias Probability (ZBP) from input wires and using the specific logic functions of each gate types.
2.3. HCI: circuit wear-out Fig. 3. Static vs. statistical timing analysis.
In Eq. (2), SSUM and SMAX are statistical sum and max operators, respectively. Assuming that delay from individual gates conforms to a normal distribution, the SSUM function can be expressed as:
lSUM ¼ l1 þ l2 r2SUM ¼ r21 þ r22
ð3Þ ð4Þ
The key challenge here is estimating the distribution characteristics of the SMAX function as the max of two normal distributions is not a normal distribution. However, it has been shown that using an approximate normal distribution leads to negligible error [15]. The mean and standard deviation of this approximate SMAX function are given by:
lSMAX ¼ l1 UðaÞ þ l2 UðaÞ þ awðaÞ r2SMAX ¼ l21 þ r21 UðaÞ þ l22 þ r22 UðaÞ þ ðl1 þ l2 ÞawðaÞ
ð5Þ ð6Þ
When high energy carriers in the channel strike Si lattice atoms in the drain region of a transistor, they cause impact ionization displacing electron–hole pairs. Some of these carriers successfully cross into gate oxide and lead to a change in channel mobility, threshold voltage and transconductance of the transistor [17]. This phenomenon is called Hot Carrier Injection (HCI), and is fast gaining importance due to its rapidly increasing effect at lower technology nodes. A transistor having more Switching Activity (SA) undergoes larger HCI stress. To measure HCI impact, we use the failure equivalent circuit model proposed in [18]. This model adds a progressively increasing resistor (DRd) to the transistor circuit to reflect HCI induced aging.
DRd ¼
ð8Þ
where VRd is given as
where,
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð l l2 Þ r21 þ r22 a ¼ 1 a Z x 1 x2 =2 wðxÞ ¼ pffiffiffiffiffiffiffi e UðxÞ ¼ wðyÞdy 2p 1
1 þ aDN V Rd Ids0
V Rd
a¼
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 3 u u a V gdx þ V2ds u 2 q5 4 t ¼ V gdx þ V gdx þ 2V ds DN þ 1 þ a DN C ox
ð9Þ
The symbols used in Eqs. (8) and (9) are described in Table 1.
We use these statistical functions to propagate the delay distributions in our synthesized netlist.
2.4. Operating condition variation
2.2. NBTI: circuit wear-out
We consider two main operating conditions that cause variation in delay:
Negative Bias Temperature Instability (NBTI) has emerged as one of the most prominent modes of transistor aging in the current and forthcoming technology generations. When a low voltage is applied at a PMOS transistor gate, representing an input of zero, interface traps are slowly formed and the transistor enters into a stress mode [16]. After a substantial build-up of these interface traps, the effective threshold voltage begins to rise slowly. On the other hand, when a high voltage is applied at the gate, the transistor enters a recovery mode, healing the wear-out effects on the transistor performance. The aging of a given PMOS transistor, which manifests as a measurable degradation in its delay, depends
Table 1 HCI parameters. Term
Explanation
a
2.4 1012 cm2 DNit + DNox Interface trap charge Oxide trap charge Voltage drop across Rd Gate-oxide-capacitance per unit area Undamaged drain current
DN (charge density) DNit DNox DVRd Cox Ids0
S. Kothawade et al. / Microelectronics Reliability 52 (2012) 1515–1522
Voltage fluctuation: Large amounts of abrupt switching activities in modern processors may cause substantial voltage drop that leads to delay faults [19]. The voltage transient happens within very short periods, usually for a few cycles [20], causing irregular and high-frequency timing faults. Temperature fluctuation: Temperature has a significant effect on the CMOS circuit speed. High temperature causes delay degradation in runtime leading to timing violations. The temperature of a core varies at a relatively slow rate, hence the induced timing errors tend to happen in clusters for a period of time [21]. 2.5. TDDB and electromigration In addition to the factors mentioned above, TDDB and Electromigration also affects the circuit delay over time, and can impact intermittent faults due to timing violation. We discuss these failure mechanisms, and their potential impact on circuit timing next. 2.5.1. TDDB With rapid technology scaling, as the gate oxide thickness became smaller, dielectric breakdown became an important failure mechanism in CMOS technologies. TDDB is caused by a gradual development of a conductive path through the gate oxide, leading to an increased leakage current through it. Eventually, the device can reach a complete failure mode, where it is no longer responsive to input stimuli. In the context of intermittent timing faults, TDDB is a minor factor compared to other factors discussed above. First, the introduction of high-j dielectric has substantially reduced the oxide breakdown rate, improving its TDDB reliability [22]. Second, previous studies have also shown that circuit timing is affected from TDDB only after a substantially longer time compared to the delay degradation from NBTI/HCI [23]. Due to these reasons, we have omitted modeling TDDB in our study. 2.5.2. Electromigration Electromigration is the primary failure mechanism for interconnects in a microprocessor [24]. This phenomenon stems from the imperfection in metal wires due to missing atoms or other impurities. Consequently, during the flow of electrons, they may collide with the metal atoms and move them along the current flow, causing Electromigration. Electromigration primarily affects global power wires, clock lines, and bi-directional buses between caches and on-chip cores [25]. However, unidirectional wires do not suffer from electromigration as they always charge and discharge through the same end [25]. Local wires within a combinational or sequential circuit block are uni-directional as signals travel in a single direction: input to output. Consequently, circuit delay within a combinational and sequential circuit block has limited impact from electromigration. Consequently, we have not considered electromigration in our analysis for intermittent timing violations. 3. Timing analysis In this section, we show the results of analyzing the MIPS-789 pipeline stages in our comprehensive framework to show a quantitative estimation of intermittent timing violations.
Nominal temperature: We assume 70°C as our nominal temperature, this is often seen as an average thermal profile of a microprocessor. Process variation: We assume that the transistor length, width and oxide thickness behave as Gaussian Distributions where r/l is 0.2, based on a conservative estimate from published data at 65nm and deductions made from previous technology generations [26,27]. NBTI and HCI aging: We assume a 5 year aging period. This is a fairly standard aging period investigated in aging related previous works [28]. Since the NBTI degradation of a gate is strongly dependent on the ZBP of the input values, we vary the ZBP of various gates in the entire circuit topology based on their location. Similarly, HCI degradation is heavily dictated by the switching activity of the gate, and certain gates have higher switching based on the circuit topology. Accordingly, we have varied the switching activity in various gates for HCI estimation, so as to reflect the expected logic propagation. Temperature fluctuation: To capture the higher end of the thermal profile, we use 100°C to model the impact of temperature fluctuation [29]. Voltage Fluctuation (VF): We use 15% voltage fluctuations to inspect their impact on the delay [29]. We use three basic gates in our SPICE simulation: inverter, NAND and NOR. SPICE simulations are performed using the HSpice tool with the Predictive Technology Models (PTM) [30]. We use the high performance models (HP) incorporating the high-j dielectric. We create gate distributions for every possible combination listed above, and then perform statistical analysis. 3.2. Delay degradation analysis Fig. 4 presents an estimation of intermittent timing violations across a broad spectrum of critical factors affecting the circuit delay for the MIPS-789 processor core. MIPS-789 is a five-stage pipelined core with the following pipe stages: the instruction fetch and decode stage (IF&ID), the register fetch stage (RF), the execute stage (EXEC), the memory stage (MEM) and the write back stage (WB) [13]. In Fig. 4, we use different operating conditions to show the span of delay characteristics under intermittent faults. The slew of chosen conditions represent a spectrum of scenarios of a typical high performance processor with manufacturing process variation coupled with moderate degree of wear-out, during moderate to high utilization phases. All five conditions, described below, include the effect of manufacturing process variation. C0: nominal condition without aging at 70C (baseline). C1: delay under aging due to NBTI and HCI at 70C.
2.4
Normalized delay
1518
2.2 2
IF&ID RF EXEC
1.8
MEM
1.6
WB
1.4
ISCAS
1.2 1 C0
C1
C2
C3
C4
3.1. Methodology Our framework can effectively model a vast design spectrum. However, we focus on typical design aspects. The specific components of our experiments are described below:
Fig. 4. Delay degradation of MIPS-789 pipeline stages and ISCAS85 benchmark circuits at 22nm technology node. Delays are normalized to the condition of 70 °C temperature, nominal (0.8 V) supply voltage and no transistor aging (condition C0). We report the degradation in the average delay, while considering several sources of fluctuations.
S. Kothawade et al. / Microelectronics Reliability 52 (2012) 1515–1522
C2: delay under aging at 100C. C3: delay under aging and voltage fluctuation at 70C. C4: delay under aging and voltage fluctuation at 100C. Fig. 4 reports the degradations in average delay in various pipeline stages under the conditions mentioned above. We can make several key observations from this figure. First, without any fluctuations in operating conditions, the combined effect of NBTI, HCI and PV is substantial at the end of 5 years (condition C1). On an average, we notice a delay of 1.5X of nominal conditions, which corresponds to 50% delay degradation. Clearly, such high guardband cannot be provided throughout the lifetime of the circuit, but it may be possible to increase the guard-band as the circuit progresses. Second, when high temperature is combined with existing aging and PV, the degradation becomes substantially worse. On an average, we notice 71% degradation across these circuits (condition C2): a substantially worse performance compared to NBTI, HCI and PV alone. Guard-banding techniques will clearly fail during periods of high temperature operation, leading to intermittent timing violations. However, circuit techniques based on time borrowing from adjacent pipeline stages may continue to recover from these errors. Third, the largest delay degradation stems from the fluctuation in supply voltage. For example, the combination of PV + NBTI + HCI with voltage fluctuation causes average delay degradation of 2.1X (condition C3). When high temperature is combined with the above situation, delay becomes 2.35X (condition C4). This result implies that under such conditions, computation from the combinational circuit in a pipe stage may take more than two cycles to complete.
3.3. Technology trend Given the possibility of large intermittent timing violations, it is imperative to look at the trend of these characteristics. Fig. 5 illustrates the average delay degradation among all pipe stages with 45nm, 32nm, and 22nm technology nodes. For each technology node, we choose four different combinations of delay degradation factors, on top of PV impact. We observe an increasing trend in the extent of intermittent timing violations. While the current technology node at 45nm shows all degradations to be limited within 100%, forthcoming technology generations will not be able to operate under the same assumptions. In addition to detailed results we discussed for the 22nm technology node, even in the 32nm node, the worst case intermittent timing violations approach 100% degradation point.
150
NBTI+HCI
Delay Degradation (%)
NBTI+HCI+Temp
125
NBTI+HCI+VF NBTI+HCI+Temp+VF
100
1519
3.4. Summary In this section, we presented a quantitative analysis of intermittent timing violations, under a broad spectrum of critical sources of delay degradation in integrated circuits. We find that under several cases, upcoming technology nodes show the possibility of intermittent timing violations that manifest as more than 100% delay degradation. Thus, combinational circuits in a pipelined microprocessor can easily take more than 2 full clock cycles to successfully complete their computation under these conditions. Consequently, many recently proposed circuit techniques based on time borrowing from adjacent pipeline stages will be unable to cope with such gross delay degradation. In the light of this result, we now investigate trade-off analysis of applying architectural techniques to mitigate intermittent timing violations in microprocessors. 4. Architectural techniques to mitigate intermittent timing faults In this section, we briefly describe several architectural techniques that we employ for intermittent timing fault mitigation. These techniques are based on recently proposed delay sensors to predict upcoming timing faults, discussed next. 4.1. Delay sensors One key component in implementing the architecture techniques is the measurement of circuit delay. Different schemes of delay sensors have been studied in previous works to cope with their timing fault resilient designs [9,31]. Unlike slow temperature and voltage sensors, the measurements of delay sensors are highly reliable and have fine-grained resolution. In this paper, we assume employing such delay sensors to feedback the occurrence of timing errors in our architecture. 4.2. Architecture techniques We investigate three architectural techniques to mitigate intermittent timing violations. Their characteristics are listed below: Variable-Latency (VL): We delay instruction issue and slow down the execution dynamically, when pipeline stages incur additional latency. We target execution units, register file and the instruction window as these components are most susceptible to aging and PVT fluctuations. Core Frequency Scaling (CFS): This technique scales down the frequency of the entire core to have a lower timing requirement to tolerate the degraded delay. Tuning the core frequency takes about 10ls to complete [32]. Hence this approach has considerable performance trade-off if employed frequently. Thread Migration (TM): This technique migrates the thread that encounters intermittent timing faults to an idle core. The entire register state is saved in the faulty core, and restored back on the idle core through the caches. In addition to save and restore overhead, cache and other predictor overheads can also become a bottleneck, if frequent migrations are performed.
75
5. Methodology 50 25 0 45nm
32nm
22nm
Fig. 5. Technology trend of intermittent timing violations.
We use full-system simulation built on top of Virtutech SIMICS [33]. SIMICS provides the functional model of several popular ISAs, in sufficient detail to boot an unmodified operating system. For our experiments, we use the SPARC V9 ISA, and use our own detailed timing model to enforce timing characteristics of an out-of-order microprocessor.
1520
S. Kothawade et al. / Microelectronics Reliability 52 (2012) 1515–1522
5.1. Multicore system
VL 40 30 20 10
VL
4
xalancbmk
sjeng
sphinx3
povray
perlbench
milc
omnetpp
mcf
CFS
TM
3 2 1 0
xalancbmk
sphinx3
sjeng
povray
perlbench
omnetpp
mcf
milc
gobmk
libquantum
gcc
GemsFDTD
astar
bzip2
−1
(a) For high temperature duration of 5%
xalancbmk
sphinx3
sjeng
povray
perlbench
TM
omnetpp
mcf
gobmk
CFS
libquantum
gcc
GemsFDTD
bzip2
VL
milc
9 8 7 6 5 4 3 2 1 0 −1
astar
In this section we discuss the simulation results from our architecture techniques, highlighting the performance impact of implementing several architectural techniques for mitigating intermittent timing faults.
(b) For high temperature duration of 10% Performance Loss (%)
Fig. 6 shows the performance loss, measured as IPS (Instruction per Second) degradation, when variable-latency (VL) and core-frequency-scaling (CFS) techniques are applied to mitigate aging due to NBTI and HCI. The results demonstrate that CFS is the
gobmk
5
6. Experimental results
6.1. NBTI and HCI aging
libquantum
gcc
technique of choice for aging mitigation, as it can be tuned to operate close to the actual delay degradation. However, CFS lags VL
Performance Loss (%)
Table 2 shows the configurations for various architecture techniques we study to mitigate intermittent timing violations. These configuration parameters are derived from the observed delay degradation in Fig. 4. For example, considering only NBTI, HCI and PV, the maximum pipe stage delay is 1.62X. To alleviate this problem, the core frequency can be reduced to 60% of original, or the latency of pipeline stages increased by 2X. Similarly, the combined impact of NBTI, HCI, PV, Temperature and VF can be dealt with additional 20% frequency scaling (e.g., new time period = 2.5X of the original period without aging). It can also be alleviated with 2X increase in selected pipeline latency stages, assuming that the pipeline is scaled down to handle NBTI and HCI already. For thread migration, our simulation infrastructure saves and restores register state through the on-chip caches, and we faithfully model all associated overheads. For simulation purpose, we assume that voltage fluctuations occur rarely, affecting 0.1% of the execution time, lasting 50 cycles in each occurence [35]. For temperature fluctuations, we evaluate a range of possibilities (5–20%), where the execution is affected by high temperature.
Fig. 6. Performance loss from NBTI and HCI mitigation.
Performance Loss (%)
5.3. Trade-off analysis
GemsFDTD
astar
0
5.2. Benchmarks We use representative phases from several SPEC CPU2006 benchmarks. Two most representative phases from these SPEC benchmarks are extracted using the SimPoint toolset [34]. Results are reported as the weighted average of individual results in the phases, based on their relative importance.
CFS
50
bzip2
We use a dual core system for our experiments. Each processing core has a private instruction and data cache: both of them are 4-way, 32 KB with 2-cycle latency. The system uses an 8 MB shared L2, 16-way associative with 25 cycle access latency. Each core is a 2-wide out-of-order execution engine with 128 instruction window, 11-cycles branch misprediction loop, aggressive 2-level branch predictor, and 32-entry load-store disambiguation predictor. Typically, only one core is utilized, except when performing thread migration.
Performance Loss (%)
60
15 VL
13
CFS
TM
11 9 7 5 3 1
NBTI + HCI NBTI + HCI + TEMP NBTI + HCI + TEMP + VF
VL
60 55 40
2X 60 + 2X 60 + 2X
xalancbmk
sphinx3
sjeng
povray
perlbench
mcf
milc
gobmk
gcc
omnetpp
Techniques (%) CFS
libquantum
Factors
GemsFDTD
astar
Table 2 Configurations for architectural techniques.
bzip2
−1
(c) For high temperature duration of 20% Fig. 7. Performance analysis of mitigating combined impact of NBTI + HCI aging and PVT.
Performance Loss (%)
S. Kothawade et al. / Microelectronics Reliability 52 (2012) 1515–1522
23 VL
CFS
TM
19 15 11 7 3 xalancbmk
sjeng
sphinx3
povray
perlbench
milc
omnetpp
mcf
gobmk
libquantum
GemsFDTD
gcc
astar
bzip2
−1
Fig. 8. NBTI, HCI combined temperature and VF.
substantially in milc and GemsFDTD, and marginally in omnetpp and xalancbmk. In particular, for GemsFDTD and milc benchmarks, the latency seen by the correct-path loads are more than 2X higher than all other benchmarks. Thus, bulk of their execution time is spent waiting for memory requests, diminishing the impact of clock frequency in their performance. Note that no results for TM is shown here, as we assume both the cores to undergo similar aging degradation. Thus, no benefit can be gained through TM in this case. Throughout the rest of this paper, we assume that aging caused by NBTI and HCI is alleviated using CFS. Therefore, subsequent results are presented after normalizing the performance with respect to CFS applied for aged processor (Fig. 6 results). 6.2. High temperature and NBTI aging Alleviating intermittent timing violations due to the combined impact of temperature, PV, NBTI and HCI, alters the relative merits of the architectural techniques. Fig. 7 shows this trade-off analysis. We present results for three possible scenarios, where the application experiences high temperature for 5%, 10%, and 20% of its run time, in that order. Respective techniques are applied only during this time. We observe that regardless of the fraction of time affected by temperature, CFS yields maximum performance loss among these three techniques. This loss primarily stems from multiple transition delays, which are necessary for voltage and clock stabilization [36]. TM shows the minimum performance loss, although it requires a spare core ready to execute code. In contrast, VL shows modest performance loss. For example, with 20% execution under high temperature, VL shows 1.2–7.8% loss in performance. 6.3. Temperature, voltage fluctuation and NBTI aging Finally, we show the performance loss for mitigating intermittent timing faults when all the factors are combined in Fig. 8. Here we assume that 20% of the execution is affected by high temperature, and 0.1% is affected by VF. Since increasing the pipeline latency by 2X on top of the scaled frequency for aging is sufficient to cover the average delay degradation for the combined NBTI, HCI, temperature and VF, the performance loss from VL is identical to that seen in Fig. 7c. Once again, CFS is substantially worse than all the techniques. 7. Conclusion We proposed a framework to combine the critical sources of delay degradation in integrated circuits. Our experiments with the MIPS-789 processor and ISCAS85 benchmarks reveal combined
1521
impact of NBTI-HCI aging, PV, temperature and VF can result in delay degradation up to 2.5X. Such gross degradation takes more than two clock cycles to complete, failing several recently proposed circuit techniques for error recovery. We present a trade-off analysis of alternative architectural techniques to mitigate intermittent timing faults. We observe that frequency scaling is optimal when alleviating only aging, but in presence of voltage and temperature variation, it has high overhead. Thread migration shows best performance, but relies on additional idle cores. Dynamically adjusting the pipeline latency provides modest performance loss, without requiring idle cores. References [1] Pan S, Hu Y, Li X. Ivf: characterizing the vulnerability of microprocessor structures to intermittent faults. In: Design automation and test in Europe (DATE); 2010. p. 238–43. [2] Adolfsson D, Siew J, Marinissen EJ, Larsson E. On scan chain diagnosis for intermittent faults. In: Asian test symposium; 2009. p. 47–54. [3] Chang H, Sapatnekar SS. Statistical timing analysis considering spatial correlations using a single pert-like traversal. In: IEEE international conference on computer-aided design (ICCAD); 2003. p. 621–5. [4] Khandelwal V, Srivastava A. A general framework for accurate statistical timing analysis considering correlations. In: Design automation conference (DAC); 2005. p. 89–94. [5] Bhardwaj S, Wang W, Vattikonda R, Cao Y, Vrudhula S. Scalable model for predicting the effect of negative bias temperature instability for reliable design. IET Circ Dev Syst 2008;2(4). [6] Lorenz D, Georgakos G, Schlichtmann U. Aging analysis of circuit timing considering NBTI and HCI. In: 15th IEEE international on-line testing symposium, 2009 (IOLTS 2009). IEEE; 2009. p. 3–8. [7] Lasbouygues B, Wilson R, Azemard N, Maurine P. Timing analysis in presence of supply voltage and temperature variations. In: International symposium on physical design; 2006. p. 10–6. [8] Gupta MS, Rivers JA, Bose P, Wei G-Y, Brooks D. Tribeca: design for pvt variations with local recovery and fine-grained adaptation. In: IEEE/ACM international symposium on microarchitecture; 2009. p. 435–46. [9] Dadgour HF, Banerjee K. Aging-resilient design of pipelined architectures using novel detection and correction circuits. In: Design automation and test in Europe (DATE); 2010. p. 244–9. [10] Long J. Memik SO. Automated design of self-adjusting pipelines. In: Design automation conference (DAC); 2008. p. 211–6. [11] Wells PM, Chakraborty K, Sohi GS. Adapting to intermittent faults in multicore systems. In: Architectural support for programming languages and operating systems (ASPLOS); 2008. p. 255–64. [12] Constantinescu C. Intermittent faults and effects on reliability of integrated circuits. In: International symposium on reliability and maintainability (RAMS); 2008. p. 238–43. [13] Mips 789 processor.
. [14] Choi SH, Paul BC, Roy K. Novel sizing algorithm for yield improvement under process variation in nanometer technology. In: Design automation conference (DAC); 2004. p. 454–9. [15] Clark CE. The greatest of a finite set of random variables. Oper Res 1961;9:85–91. [16] Kang K, Gangwal S, Park SP, Roy K. Nbti induced performance degradation in logic and memory circuits: how effectively can we approach a reliability solution? In: Proceedings of Asia–Pacific design automation conference (ASPDAC); 2008. p. 726–31. [17] Ning T. Hot-electron emission from silicon into silicon dioxide. Solid-State Electron 1978;21(1):273–82. [18] Li X, Qin J, Bernstein J. Compact modeling of mosfet wearout mechanisms for circuit-reliability simulation. IEEE Trans Dev Mater Reliab 2008;8(1):98–121. [19] Júnior DB, Rodrı´guez-Irago MJ, Santos MB, Teixeira IC, Vargas F, Teixeira JP. Fault modeling and simulation of power supply voltage transients in digital systems on a chip. J Electron Test: Theory Appl (JETTA) 2005;21(4):349–63. [20] Polian I, Czutro A, Kundu S, Becker B. Power droop testing. In: ICCD; 2006. p. 243–50. [21] Yan G, Liang X, Han Y, Li X. Leveraging the core-level complementary effects of pvt variations to reduce timing emergencies in multi-core processors. In: International symposium on computer architecture (ISCA); 2010. p. 485–96. [22] Kim Y-H, Lee JC. Reliability characteristics of high-k dielectrics. Microelectron Reliab 2004;44(2):183–93. [23] Sune J, Wu E. From oxide breakdown to device failure: an overview of postbreakdown phenomena in ultrathin gate oxides. In: IEEE international conference on integrated circuit design and technology; 2006. p. 1–6. [24] Hau-Riege CS. An introduction to cu electromigration. Microelectron Reliab 2004;44(2):195–205. [25] Abella J, Vera X. Electromigration for microarchitects. ACM Comput Surv 2010;42(2). [26] Sarangi S, Greskamp B, Teodorescu R, Nakano J, Tiwari A, Torrellas J. Varius: a model of process variation and resulting timing errors for microarchitects. IEEE Trans Semicond Manuf 2008;21(1):3–13.
1522
S. Kothawade et al. / Microelectronics Reliability 52 (2012) 1515–1522
[27] Zhao W, Liu F, Agarwal K, Acharyya D, Nassif S, Nowka K, et al. Rigorous extraction of process variations for 65-nm cmos design. IEEE Trans Semicond Manuf 2009;22(1):196–203. [28] Wang W, Yang S, Bhardwaj S, Vattikonda R, Vrudhula SBK, Liu F, Cao Y. The impact of nbti on the performance of combinational and sequential circuits. In: Design automation conference (DAC); 2007. p. 364–9. [29] Su H, Liu F, Devgan A, Acar E, Nassif S. Full chip leakage estimation considering power supply and temperature variations. In: International symposium on low power electronic devices (ISLPED); 2003. p. 78–83. [30] Zhao W, Cao Y. New generation of predictive technology model for sub-45 nm early design exploration. IEEE Trans Electron Dev 2006;53(11):2816–23. [31] Henzler S, Koeppe S, Kamp W, Mulatz H, Schmitt-Landsiedel D. 90nm 4.7psresolution 0.7-lsb single-shot precision and 19pj-per-shot local passive interpolation time-to-digital converter with on-chip characterization. In: International solid-state circuits conference; 2008. p. 548–635.
[32] Donald J, Martonosi M. Techniques for multicore thermal management: classification and new exploration. In: International symposium on computer architecture (ISCA); 2006. p. 78–88. [33] Magnusson PS, Christensson M, Eskilson J, Forsgren D, Hållberg G, Högberg J, et al. Simics: a full system simulation platform. IEEE Comput 2002;35:50–8. [34] Sherwood T, Perelman E, Calder B. Basic block distribution analysis to find periodic behavior and simulation points in applications. In: IEEE parallel architectures and compilation techniques (PACT); 2001. p. 3–14. [35] Joseph R, Brooks D, Martonosi M. Control techniques to eliminate voltage emergencies in high performance processors. In Proceedings of high performance computer architecture (HPCA); 2003. p. 79–90. [36] Donald J, Martonosi M. Techniques for multicore thermal management: Classification and new exploration. In: International symposium on computer architecture (ISCA); 2006. p. 78–88.