Exploiting component dependency for accurate and efficient soft error analysis via Probabilistic Graphical Models

Exploiting component dependency for accurate and efficient soft error analysis via Probabilistic Graphical Models

Microelectronics Reliability 55 (2015) 251–263 Contents lists available at ScienceDirect Microelectronics Reliability journal homepage: www.elsevier...

1MB Sizes 0 Downloads 48 Views

Microelectronics Reliability 55 (2015) 251–263

Contents lists available at ScienceDirect

Microelectronics Reliability journal homepage: www.elsevier.com/locate/microrel

Exploiting component dependency for accurate and efficient soft error analysis via Probabilistic Graphical Models q Jiajia Jiao a,⇑, Da-Cheng Juan b, Diana Marculescu b, Yuzhuo Fu a a b

School of Microelectronics, Shanghai Jiao Tong University, China Department of Electrical & Computer Engineering, Carnegie Mellon University, United States

a r t i c l e

i n f o

Article history: Received 29 April 2014 Received in revised form 29 July 2014 Accepted 15 September 2014 Available online 19 October 2014 Keywords: Soft error ACE AVF Logical masking effects Probabilistic Graphical Models

a b s t r a c t As the technology node continues to scale, soft errors have become a major issue for reliable processor designs. In this paper, we propose a framework that accurately and efficiently estimates the Architectural Vulnerability Factor (AVF) of critical storage structures of a processor. The proposed approach exploits the masking effects between array structure (e.g., register files and Caches) and logic units (e.g., IntALU) via the unified Probabilistic Graphical Models (PGM) methodology, and can provide guaranteed AVFs by two accuracy–efficiency tradeoff solutions. The experimental results have confirmed that, compared to current state-of-the-art approaches, the proposed framework achieves accurate and efficient estimation via two instanced solutions: (1) first-order masking effects up to 45.96% and on average 8.48% accuracy improvement with 52.01 speedup; (2) high-order masking effects average 87.28% accuracy improvement with 43.87 speedup. The two different accuracy–efficiency tradeoff of proposed MEA-PGM can be applied into different estimation scenarios (e.g., short time to market of general mobile devices and high reliable requirements in aerospace platforms) in flexibility. Ó 2014 Elsevier Ltd. All rights reserved.

1. Introduction Soft errors, also known as transient faults or single-event upset, are caused by external radiation or electrical noise. Due to lower supply voltages, higher integration density, and other factors, the soft error rate dramatically increases as the technology node scales down [1]. To effectively tradeoff the design cost (e.g., area) with higher reliability, accurate and efficient estimation of soft error impacts is required at an early design stage. Such estimates, conventionally, are used to identify the components with a high vulnerability to soft errors, and thereby system designers can effectively deploy mitigation strategies to minimize the impacts brought by soft errors without introducing much design overhead. Accurate and efficient estimation of soft error impact is required at an early design stage to effectively tradeoff the design cost (e.g., area) with reliability. Existing work on soft error estimation often uses: (1) Fault Injection (FI) to guarantee accuracy [2–4,25]; (2) Fault free based analytical models to improve speed [5–8]. The q This work was supported by Innovation Program of Shanghai Municipal Education Commission, Numbered 14ZZ018. ⇑ Corresponding author at: No. 800 Dongchuan Road, Minhang District, Shanghai, 200240, China. Tel.: +86 13671828752. E-mail addresses: [email protected] (J. Jiao), [email protected] (D.-C. Juan), [email protected] (D. Marculescu), [email protected] (Y. Fu).

http://dx.doi.org/10.1016/j.microrel.2014.09.011 0026-2714/Ó 2014 Elsevier Ltd. All rights reserved.

former is too time consuming while the latter only provide the over-pessimistic value. How to achieve accurate and efficient estimation is still an open problem. In this paper, we propose a new framework; Masking-EffectAware analysis cooperated with Probabilistic-Graphical-Models (MEA-PGM) methodology to estimate the soft error impacts on a processor. To the best of our knowledge, the proposed MEA-PGM brings the following contributions:  Masking-effect exploration via PGM. We explore and exploit the masking effects generated from component dependencies via PGM. The masking effects used to significantly reduce the ‘‘false-positive’’ soft-error rate, are characterized by the effective PGM methodology.  Flexible accuracy–efficiency tradeoff. Based on the masking effects discovered via PGM concept, MEA-PGE provides guaranteed AVF estimations efficiently via two instanced PGM implementations respectively: (1) 8.48% accuracy improvement and 52.01 speedup for the short time-to-market of general processor design; (2) 87.28% accuracy improvement and 43.87 speedup for high reliable requirements of aerospace applications.  Comprehensive evaluation. To conduct a comprehensive evaluation, we compare the proposed MEA-PGM with three commonly-used models by both industry and academia [6,8,9].

252

J. Jiao et al. / Microelectronics Reliability 55 (2015) 251–263

The experimental results confirm the proposed MEA-PGM is cost-effective: 8.48–87.28% higher accuracy over optimal ACE methods and 43.87–52.01 speedup over FI method. The remainder of this paper is organized as follows. Section 2 provides the necessary related work review. Section 3 gives the detailed problem formulation and Section 4 details the proposed MEA-PGM. Section 5 shows the implementation flow, while Section 6 provides the experimental results. Section 7 concludes the paper and points to future work. Section 8 serves as the appendix that provides the Supplementary materials. 2. Related work The section depicts the related work of soft error estimation: (1) AVF + SoFA methodology overview in Section 2.1; (2) accurate Fault injection versus. Fast Fault Free Analysis for AVF estimation in Section 2.2; (3) existing works on ACE methods; (4) works review of masking effects estimation. 2.1. AVF + SoFA methodology AVF + SoFA (Sum of Failure All) methodology measures the soft error impacts. The metric FIT (Failure in Time, the number of errors during 109 h) is calculated by two steps: (1) estimate Architectural Vulnerability Factor (AVF), which represents the probability that a single-bit upset results in a user-visible error in the final output at architecture-level [5]; (2) sum up FIT of all components while the ith component FITi can be calculated by the product of AVF and FITraw in Eq. (1) [33], where FITraw is the inherent FIT due to the joint effects of physical environment, device and circuit designs.

FIT i ¼ AVF i  FIT raw

i

ð1Þ

AVF estimation is very critical to calculate the final FIT. And thereby our focus is accurate and fast AVF estimation. 2.2. Fault injection versus. Fault free ACE analysis The existing works to address the AVF estimation focus on two aspects: accuracy and speed. Accurate fault injection. Soft error estimation uses FI [2–4,25] to calculate AVF in Eq. (2),

AVFðFIÞ ¼

Nerr Ntotal

ð2Þ

where Nerr denotes the number of simulations with an observed fault and Ntotal represents the total number of simulations. The accuracy of AVF estimated by FI is directly related to Ntotal. In other words, a large number of simulations will be required for an accurate AVF, which is very inefficient (usually up to days) though Maniatakos et al. uses selective policy for about up to18 speedup [25]. Fast fault free analysis. The outstanding representative of fault free analysis is the very popular and simple ACE (Architectural Correct Execution) analysis. The ACE method provides an alternative to estimate AVF by using only one (or at most two) simulation runs. The idea is to exploit the structure of a typical processor by analyzing the cases, where some single bit faults will not produce an error in a program’s output. This method measures in cycles each ACE piece (a critical time period which will affect the architectural state or application output, during which the final output can be affected by an event upset in). Instead, an un-ACE piece is not harmful in Fig. 1. Based on this concept, the AVF of a structure with a bit width of N can be expressed as:

AVFðACEÞ ¼

 N1  1X ACE cycles for bit i N i¼0 Total cycles

ð3Þ

Fill Write

ACE

Read Evict

un-ACE

Fill Write

Evict Fill

un-ACE

t

Fig. 1. ACE piece for storage structure based on access types.

Compared to FI, ACE is faster and more suitable for early design stage exploration. In the context of the complex multi-core/manycore architectures, simulation time and number of components that needs to be considered increase for each design generation, and therefore, ACE is a preferred method in both industry and academia [7–9] than FI. However, these ACE methods need to be conservative, and leads to a pessimistic estimation. The AVF estimated by ACE is 2–3 higher than the AVF estimated by FI [2]. Note that overestimating AVF directly leads to excessive area or performance overhead due to overprotection; for example, it has been shown that merely 7.74% overestimation may cause up to 40.39% area overhead [22]. In this paper, we focus on the accuracy improvement of ACE while still keeping the high speed of ACE estimation. 2.3. Related work on ACE analysis Prior work on ACE addresses two aspects: accuracy and speed. Wang et al. [2] and George et al. [4] both pointed that AVFs computed using ACE analysis method overestimate the result of soft error rate by 2–3 in many fault injection experiments. To overcome the conservatism of the method, fine grain ACE via adding more simulation details is used in [7,8] to reduce the gap. For example, the branch instruction does not need one destination register, and thus the field of destination address in ROB does not include ACE bits. In a different work [9] FI is mixed with ACE to compute the AVF of the Cache structure more accurately. Other related work focuses on ACE estimation speed. The authors use machine learning approaches [10,11] or a mechanics model [12] to simplify ACE analysis and achieve faster AVF prediction. All these methods assume AVF of ACE analysis itself is accurate and aim at inexpensive AVF estimation. Accurate ACE analysis not only minimizes the design overhead of mitigation schemes, but also provides reliable training data for machine learning based prediction. Therefore, more accurate ACE analysis is our goal in this paper. 2.4. Related work on masking effects estimation Generally speaking, masking effects fall in three categories: (i) electrical masking; (ii) latching-window masking, and (iii) logical masking. So far, several effective methods incorporating electrical masking and latching-window masking have been proposed like SEAT-LA [18] and CEP [19]. However, logical masking effects where an error occurs in a logic unit or a storage element but has no effect on the architectural state or application output, is usually captured in the FI method [20]. For ACE analysis, only a limited set of methods [8,9] have considered the logic masking effects in AVF estimation. Fu et al. [8] used the extreme cases of AND/OR instruction (e.g., one operand is zero) and NOP instructions. Haghdoost et al. [9] combined the ACE analysis with the average masking rate computed by FI to determine a more accurate AVF for Cache structures. Besides the two system-level masking effects characterization, some works on instruction-level masking effects also are estimated in [27] via an analytical model and he metric of PVF (Program Vulnerability Factor) is defined in [28] to evaluate the soft error impacts from

J. Jiao et al. / Microelectronics Reliability 55 (2015) 251–263

the instruction-level. All the masking effects considered estimation can be used for cost-effective reliable designs [29–31] using duplicated registers or compiler-oriented code optimization. Thereby, the more accurate estimation should be achieved for more effective mitigation designs. As a complementary work, our approach MEA-PGM captures logical masking effects comprehensively via PGM methodology, and therefore can achieve a nearly optimal upper bound of AVF fast. The main differences from above existing works are: (1) using PGM methodology to quantify the error masking effects can convert the raw analytical modeling work into the general and unified PGM flow; (2) PGM solutions can be flexible for different accuracy– efficiency tradeoff for their different complexities. 3. Problem formulation This section describes the target object and detailed problem formulation respectively. 3.1. Target object In this paper, on one hand we just focus on the soft error analysis of storage structures. The reasons are twofold: (1) Storage structures dominate in overall soft error impact. Control-logic/ data-path, has inherently less impact on overall reliability by a 5–20 derating factor due to logical, electrical, and latching window masking [12]. Furthermore, ACE analysis overestimates the AVF of storage structures by 2–3 and cause excessive design cost due to overprotection. (2) Control-logic/data-path estimation at architecture-level has been addressed well [6], while the remaining challenges for control-logic/data-path are at circuit-level [23,24], beyond the scope of this paper. On the other hand, we take the register files as a case study: (1) Since the register files are frequently-accessed and extremely vulnerable to soft errors [13,14]. (2) Furthermore, not like L2 Caches, the soft errors in register files usually will not be corrected by the expensive Error Correction Coding (ECC) [15,16] (e.g., Intel P6 family, AMD Hammer, Intel Itanium and IBM Power4 all just use the simple Parity [32]). (3) Register file is the unique entry to the logical unit ALU. The accurate AVF of other storage structures like L1 data Cache also depends on the masking rate of register file. And the proposed MEA-PGM can also be generalized to all array data structures easily (e.g., L1 data Cache). In this paper, we assume the control (e.g., instruction code) related storage structures are safe like L1 instruction Cache. 3.2. Detailed formulation Eq. (4) represents the calculation of AVF for the register file (RF) and considers the logical masking effects as follows:

PN1 PK i 1 W T ij ð1M ij Þ i¼0 j¼0 AVF RF ¼ WT total N PN1 PK i 1 T ij ð1Mij Þ ¼ i¼0 T j¼0 N And ð0 6 M ij 6 1Þ total

ð4Þ where N is the number of registers; Tij denotes the cycle count of the jth ACE piece of register i; Ki give the total number of ACE pieces for register i; W represents bit-width of a register; and Ttotal gives total execution time in cycles. Mij the masking rate of the jth ACE piece for register i. Different methods apply different policies to compute Mij. Thus, the problem is further converted into how to find Mij. FI is not used to quantify Mij for its time consuming, and not guaranteed. The formulation of the ACE method to compute AVF is described as follows. First, the following four input are given (1) target

253

architecture (e.g. alpha processor); (2) workloads (e.g., SPEC2000 (INT)); (3) the component size (e.g., N registers in the register file); (4) each ACE piece in cycles and the total number of ACE pieces. The decision variable is the masking rate for each ACE piece, which depends on the method used. The objective is to use the most accurate masking rate estimate for a tighter bound on AVF.

4. Proposed MEA-PGM In this section, we detail the framework of the proposed MEAPGM. We first introduce observation of the logical masking effects between different components in a processor (Section 4.1). Next, we provide the detailed MEA-PGM components design (Section 4.2). Then, the MEA-PGM overview is summarized in Section 4.3. 4.1. Observation and analysis: error masking and error spread As Eq. (4) shows, typical ACE assumes the masking rate is always 0 and determines the AVF by summing up the cycles of discrete ACE pieces over the total execution cycles [5]. The potential masking effects of each ACE piece are ignored in this conservative analysis. The possible logical masking effects can be classified into three categories: (1) by opcode (e.g., AND, OR inherent masking instructions), (2) by bit-extension (e.g., ADDL only low half bits attends the instruction execution), and (3) by sanitized instructions (FDD/TDD, Nop and Prefetch). For example, in the AND instruction, the original value 0  2031 becomes 0  3031 due to an error in reg a, but the instruction output c stays correct. Similarly, we can find that, although some bit upsets occur in the registers a or b, the internal output c is correct or has no effect on the final output. Therefore, exploiting the error masking effects can mitigate the overestimation of ACE, which is confirmed through quantitative simulation results. By using FI for computing the average masking rate, we show that it can be as high as 20–75% [9] and therefore, should be considered. All in all, the logical masking effects are critical for accurate AVF estimation and will be quantified efficiently in the proposed MEA-PGE. Based on the above coarse-grain observation of masking effects, two fine-grain factors affecting the accuracy of AVF are error masking and error spread. (1) Error masking. There are two values used to quantify the error masking. One is the instruction masking rate. Each instruction may use the ALU unit to mask errors. E.g. ADDL (low half bits) masks partial errors, while no harmful Prefetch instruction can mask all errors. The other is error masking depth. If an error injected in reg a (0000 ? 1000) is not masked by ADD in Fig. 2(a), the bit upset will show up in output reg c (1000 ? 0000). Then, the derived error is masked by OR instruction and the correct value is delivered in reg d (1000 ? 1000). This case marked by the red arrows in Fig. 2(a) is called second-order masking effects. Similarly, a higher-order case has a higher masking strength. (2) Error spread. There are continuous ACE pieces in the same storage space. E.g., successive ACE pieces (read operations) in green elliptic curves of reg a in Fig. 2(a). The error occurring in reg a between LOAD and ADD will stay in its successive ACE piece because the value stays unchanged until rewritten in storage space, named error spread.1 The larger the size of the successive ACE pieces, the weaker the masking effects. 1 Error spread actually is the inherent retention of bits upset in a storage structure, because the value stored stays unchanged unless it is rewritten. Here, we just call it error spread due to the error influencing different ACE pieces of the same register.

254

J. Jiao et al. / Microelectronics Reliability 55 (2015) 251–263

Resources Time

LOAD

0

reg m(addr) -> reg a( 000) ; X1: reg a ACE piece

Error masking

1

ADD

0

reg b(1000) + reg a( 000) -> reg c( 000); X2: reg a ACE piece

Error spread

OR

X3: reg c ACE piece

1

reg a( 000) |

0

reg c( 000)

->

1

reg d( 000);

X5: reg a ACE piece

STORE

X4: reg d ACE piece

reg a (addr),

reg d;

(a)

PðX 1 ; X 2 ; X 3 ; X 4 ; X 5 Þ ¼ PðX 1 ÞPðX 2 jX 1 ÞPðX 3 jX 1 Þ PðX 5 jX 2 ÞPðX 4 jX 2 ; X 3 Þ

Resources Reg a

Time

X1 reg_a (Fill -> Read) ADD

Reg c

Reg d The value of Xi˖ þ1ÿbits upset; þ0ÿno bit upset;

Error masking

Error spread P(X2=1|X1=1) =1 P(X2=0|X1=1) =0 P(X2=1|X1=0) =0 P(X2=0|X1=0) =1

P(X3=1|X1=1) =1-m3 P(X3=0|X1=1) =m3 X2 reg_a (Read -> Read) OR

X3 reg_c (Write -> Read) OR

X4 reg_d (Write -> Read)

X5 reg_a (Read -> Read)

composed of three parts: representation, learning and inference [25]. Representation. This is the fundamental and critical factor for the proposed framework and includes directed graphical models (Bayesian networks in Fig. 2(b)) and undirected graphical models (Markov Random Fields). In general, a PGM defines a family of probability distributions that can be represented in terms of a graph. Nodes in the graph correspond to random variables X1, X2 . . .Xn; the graph structure translates into statistical dependencies (among such variables) that drive the computation of joint, conditional, and marginal probabilities of interest. For example, in Fig. 1, the joint probability or the Bayesian factorization is described by the conditional probabilities in Eq. (5):

P(X4=1|X2=X3=1) =1-m4'' P(X4=0|X2=X3=1) =m4'' P(X4=1|X2=X3=0) =0 P(X4=0|X2=X3=0) =1 P(X4=1|X2=1,X3=0) =1-m4' P(X4=0|X2=1,X3=0) =1-m4' P(X4=1|X2=0,X3=1) =1-m4' P(X4=0|X2=0,X3=1) =1-m4'

(b) Fig. 2. Soft error estimation issue mapped into the PGM based framework. (a) Instance of error masking effects and error spread among continuous ACE pieces. Once the error occurs in reg a Read by LOAD ? reg a Read by ADD, on one hand the bit upset in reg a may be masked in its output reg c and reg d along the red path; on the other hand, the bit upset will remain in the following Read by OR and Store instructions and spread to its continuous ACE pieces. (b) Bayesian network for AVF estimation. Each node represents one ACE pieces and the structure describes the error masking (in red) and error spread (in green) dependencies between nodes. And each node also follows the binary Bernoulli distribution 0 or 1 and the CPTs can be learned via fine-grain instruction level masking rate. And the mapping rules can convert the specified soft error ACE based estimation issue into the problem of PGM. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

How to capture the two effects determines the efficiency and accuracy of AVF estimation. Our MEA-PGM aims at addressing the problem well via the PGM. 4.2. Complete MEA-PGM We choose to work within the PGM framework [25], because this offers the unique advantages: (1) efficiency potential because of PGM low complexity for the high-dimensional dependencies converted into small factorization; (2) modularity for easily integrating the PGM module into existing estimation methods; (3) perfect match due to the large-scale ACE pieces fulfilling the conditional independences. In this Section, before we describe the MEA-PGM, let’s review the PGM knowledge fast and briefly, and then map the AVF calculation into the PGM problem as follows, which includes the instanced three main components based on PGM flow shows: (1) mapping the problem into the fundamental PGM representation; (2) node parameters learning to support the inference, (3) truncated inference to maintain the practical efficiency. 4.2.1. General PGM concept Before we describe the MEA-PGM, let’s review the PGM knowledge fast and briefly. Generally, the complete PGM framework is

ð5Þ

Learning phase. The learning graphic model from data is another important concept for the factorization of the distribution. Learning structure and learning parameters are the two most basic learning tasks as follows:  Structure learning. As Fig. 2(b) shows, the deterministic Bayesian graph determines the structure in PGM. Without the structure, the joint probability cannot be expressed by the reduced conditional probabilities for the unknown dependencies. Therefore, it is very critical to get the structure accurately for correct factorization.  Parameter learning. The parameters denote the required Conditional Probability Table (CPT) for the joint probability calculation. Here, we assume the conditional probabilities all follow the Bernoulli distribution for the soft error induced bit upset occurrence. E.g., the node X4 in Fig. 2(b) has two upstream nodes it depends on (X2 and X3), which means three unknown parameters (m4, m40 , m400 ) determine the probability of each case. Inference phase. Based on the available structure and node information, using a given algorithm (e.g., Variable Elimination and conditioning) for exact inference, we can determine the required marginal probabilities. For example, the marginal probability can be expressed as the joint probability based on structure: P(X4 = X5 = 0, X1 = 1) = P(X1, X2, X3, X4, X5). Then, the joint probability will be further determined from the conditional probabilities based on node parameters (in Fig. 2(b)) via the aforementioned efficient inference algorithms. 4.2.2. Mapping problem into Bayesian network Representation Firstly, the register operations connected by the sequential instructions are divided into ACE pieces in Fig. 2(a). Once a soft error occurs in reg a during the ACE piece X1, the bit upset will spread and propagate to all its related registers. There are two kinds of dependencies between the parent ACE piece and children ACE piece: (1) ‘‘input to output’’ error masking in the same instruction: e.g., reg a ? reg c via ADD instruction, where the error may be masked. (2) ‘‘Input to another input’’ error spread for the successive register read of the same register: e.g., reg a in ADD instruction ? reg a in OR instruction, where the error merely spreads and has no any masking probability. Then, we map this raw information based on ACE pieces in Fig. 2(a) into a PGM structure as Fig. 2(b) shows by using: (1) Structure type: each ACE piece Xi denoted as a node2 has bits upset or not, so a Bayesian network based on node Bernoulli distribution can fit 2 A node in the Bayesian network corresponds to one ACE piece, and we will refer to node and ACE piece interchangeably throughout the remainder of this paper. In addition, we assume only one observed bit upset in one ACE piece per analysis, which is consistent with the fault injection case.

J. Jiao et al. / Microelectronics Reliability 55 (2015) 251–263

well; and (2) Structure edges: represent the two dependencies to correspond the two factors in error estimation. Namely, each node represents one ACE piece and its children are error spread or error masking. Following the rule, the X1 related sub-graph can be constructed as Fig. 2(b) shows. The children of X1 are {X2, X3}. Similarly, X2 and X3 have their own children respectively. The process is repeated until all leaf nodes of the sub-graph are store instruction related ACE pieces (we assume store instructions are the output observation points, which is reasonable for consistency with typical fault injection experiments). Finally, once we get the sub-graph of the root node ACE piece where an error is injected, that means the joint probability can be expressed by P(X1, X2, . . ., X5) = P(X1)P(X2|X1)P(X3|X1) P(X5|X2) P(X4|X2, X3) in Fig. 2(b), the marginal probability of all leaves nodes {X4, X5} without bits upset P(X4 = X5 = 0|X1 = 1) is the masking rate of the root node. Therefore, our critical task is to compute the marginal probability of each sub-graph. Finally, once we get the sub-graph of the root node ACE piece where an error is injected, the joint probability can be expressed by P(X1, X2, . . ., X5) = P(X1)P(X2|X1)P(X3|X1) P(X5|X2) P(X4|X2, X3) in Fig. 2(b), while the marginal probability of all leaves nodes {X4, X5} without bits upset P(X4 = X5 = 0|X1 = 1) is the masking rate of the root node. Therefore, our critical task is to compute the marginal probability of each ACE piece for accurate AVF estimation.

4.2.3. Structure and node parameters learning This section provides the two learning tasks: structure learning and node parameters learning for the following inference. Structure learning. As Fig. 2 shows, we use simple mapping rules to construct the Bayesian network structure. In order to build the structure efficiently: (1) We maintain a global graph to record each node and edge, which is based on dynamic insertion (when the latest instruction is executed) or removal (when the inference for the latest related sub-graph is finished) of every ACE piece until the simulation ends. (2) The sub-graph of each ACE piece is separated from the global graph for the marginal probability calculation via the inference algorithms. E.g., Fig. 2(b) shows the sub-graph for ACE piece or node X1. The sub-graph is completed if all leaf nodes are store instructions. In all, the PGM structure is derived from realistic benchmark traces and provides an exact graph representation for further inference. Node parameters learning. Besides the graph structure, we also need the node parameters to complete the inference and calculate the required marginal probability. That means the Conditional Probability Table (CPT) of each node (ACE piece) must be determined given its parents. Firstly, as described in Section 4.1, MEA-PGM tries to exploit the three types of masking effects by modeling the masking rate for each ACE related instruction with details described as follows. The remaining mechanisms outside the listed three types have

255

no potential masking effects and their instruction masking rate is zero. Operands like reg a and reg b are ACE related in Table 1 (More details available in Appendix A.1), which shows the computing policy and some typical examples.  Opcode masking effects. The probability of masking a bit upset is determined by the percent of ones count in the operand. For example, an AND instruction (column 2 in Table 1), the masking rate of reg a is determined by the percent of bits that is: 1  Nb/ W, where Nb is the number of bits with value 1 and W is total width of reg b.  Bit-extension operation. This operation can mask some bit upsets through shift logic operations. For example1 (column 3 in Table 1), the ADDL instruction just uses the lower bits of operands and then extends the output c to the whole width (W). Therefore, the instruction-level masking rate Minst is (1/2). Considering the sign bit, Minst is reduced further by 1/W. Thus, Minst can be the percent of extension bits out of the total bit-width.  Sanitized instructions. Sanitized instructions represent the instructions whose generated outputs are either unused or used by other sanitized instructions. Specifically, sanitized instructions include FDD/TDD as well as NOP/Prefetch instructions. Therefore, the soft errors for all their operands can be masked and Minst is set to 1. NOP/Prefetch can be identified when they are being executed, while FDD/TDD identification is based on post-commit analysis. Therefore, if the instruction level masking rate (or the basic data for each node, CPT) is determined from above modeling, we only need do some simple processing in Fig. 2(b) to get the CPTs as follows: (i) If Xi is the child of Y (uniqueness) based on the error spread edge, then P(Xi|Y) = Xi XNOR Y; (e.g., Xi = 0, Y = 1, or Xi = 1, Y = 0 then the possibility is 0; otherwise, the value is 1). (ii) If Xi is the child of its unique parent Y based on error masking edge, then we can set the CPT as P(Xi = 0|Y = 1) = mi, P(Xi = 1| Y = 1) = 1  mi, P(Xi = 1|Y = 0) = 0, P(Xi = 0|Y = 0) = 1, where mi is the masking rate Minst of the ACE piece Xi. (iii) If Xi has two parents Y1 and Y2 based on the error masking edge, the case of at most one fault parent (Y1 = 1, Y2 = 0 or Y1 = 0, Y2 = 1) can be handled as (ii). When two parents both incur bit upsets, P(Xi|Y1 = 1, Y2 = 1) depends on whether it is the operand of a sanitized instruction or not. If yes, the CTP becomes: P(Xi = 1|Y1 = 1, Y2 = 1) = 0 and P(Xi = 0|Y1 = 1, Y2 = 1) = 1. Otherwise, P (Xi = 1|Y1 = 1, Y2 = 1) = 1 and P(Xi = 0|Y1 = 1, Y2 = 1) = 0. This way, conservative analysis is guaranteed by considering the worst case of two faultaffected parents.

4.2.4. Truncated VE inference The number of ACE pieces for each benchmark is so large (about 40–60 M size for 50 M instructions execution) that accelerated policy is required. In this paper, we propose the truncated inference

Table 1 Computing model of instruction-level masking rate for register file. Type of masking effects

Opcode

Bit extension

Sanitized instructions

Computing policy

Use the number of ‘1’ in operands

Compute the bit extension percent in the total width of the operand

Identify the Sanitized instructions by dependency check

Example 1

AND: c = a & b 1  Nb/W BLBC: !(a & 0  1); 1 – 1/W

ADDL: c = (a + b)  (W/2) 1/2  1/W SRA: c = a  (b%64) 1  (b64 – 1)/W

ADDQ: c = a + b 1 While FDD/TDD XOR: c = a ^ b 0 While not FDD/TDD or NOP/Prefetch instruction

Example 2

Instruction Minst (a) Instruction Minst (a)

256

J. Jiao et al. / Microelectronics Reliability 55 (2015) 251–263

based on a typical variable elimination (VE) algorithm. Standard VE is used to handle the sub-rs one by one for exact inference. The detailed algorithm and graphs of selected ACE pieces because it sums out variables from a list of facto more information including the proof of VE exact feature is available in [26]. An optimal elimination ordering is one that results in the least complexity. Finding an optimal elimination ordering is NP-complete. Here, we use the topological order as the eliminating order. However, The truncated main idea is to adjust the inference algorithm complexity to achieve the estimation accuracy–efficiency tradeoff. We uses two different truncated policies: (1) first-order masking effects via vertical truncated VE inference; (2) higher-order masking effects via horizontal truncated VE inference according to the sub-graph size and cycles count per ACE piece. (1) First-order masking effect with low space complexity. First-order masking effect policy means just considering the first instruction masking effects of bits upset in operand register. For example in Fig. 2(b), if the bits upset are injected in X1, then the error propagations paths shrinks as {X1 ? X3, X1 ? X2 ? X4, X1 ? X2 ? X5} while neglecting the high-order masking effects {X3 ? X4}. Therefore, the first-order masking effect makes the sub-graph smaller and simpler. According to the VE algorithm, we can get the Eq. (6), the masking rate of instruction i (MRi) relies on the Read operations from the successive operations (i + 1, . . ., n).

MRi ¼

n Y

Minst

ð6Þ

s

s¼i

When calculating Mi for the instruction i, all (n  i + 1) values of Minst_s (s = i to n) need to be recorded, resulting in the space P complexity O( (n  i + 1), where i = 0 to n) = O(n2). The value of n cannot be known in advance. In this paper, we proposed a dynamic-programming approach to reduce the complexity to O(1), which can be expressed as:

T ACE ðnÞ ¼

n n X Y T i 1  M inst i¼0

! ð7Þ

s

s¼i

Once Mi is available, the value is combined with the cycle count of each ACE piece as in Eq. (7). Then, DTACE(n + 1), the difference of ACE cycles introduced by the incoming (n + 1) register read operation, is computed by Eq. (8)

DT ACE ðn þ 1Þ ¼ T ACE ðn þ 1Þ  T ACE ðnÞ ¼ ð1  Minst

nþ1 X T i  T ACE ðnÞ ðnþ1Þ Þ 

! ð8Þ

i¼0

From Eq. (8), it is clear the DTACE(n + 1) only depends on the previous accumulated ACE cycles TACE(n) and current instruction-level masking rate Minst_(n+1). That means only two extra parameters are recorded instead of complex network extraction and high storage

Fig. 3. MEA-PGM algorithm for first-order truncated inference.

J. Jiao et al. / Microelectronics Reliability 55 (2015) 251–263

257

Fig. 4. MEA-PGM algorithm for high-order truncated inference.

overhead in Fig. 2(b). Such a horizontal truncated inference is low space complexity. (2) High-order masking effects via skipping large sub-graph or small ACE cycles with low time complexity. The main idea is to skip the handles, which has little effects on the final estimation. One is to skip the ACE pieces with small ACE cycles; the other is to skip ACE pieces with large sub-graph. Once one of the two conditions is satisfied, the masking rate is set to the default value of zero and its complete sub-graph construction and inference handle will be skipped for the ACE piece. Therefore, an accelerated estimation can be performed. Such a skipping policy is reasonable for some simulation results support. For example of the benchmark: gcc of SPEC2000, (1) the percent of ACE cycles less than 5 is up to 67.96% but only 9.89% of overall AVF value; (2) the size of sub_graph is more than10 K, which has the possibility of 95.7% more than100 K. More details about the two parameters discussion can be available in Section 6.3. 4.3. Overall algorithm As Section 4.2.4 shows, the MEA-PGM algorithm is determined by the truncated policies, the first-order version can use two extra parameters to achieve low space complexity, while the high-order version adopts the skipping policies to reduce the time complexity. Figs. 3 and 4 provide the detailed algorithm overview for two versions respectively: (1) We use the MEA-PGM algorithm in Fig. 3 to compute the AVF value of the register file. The input information includes: latest register Write time (LastWriteTime) and current time of the register operation (Now) in cycles, the instruction information to compute its Minst_curr as 1 gives. Lines 11 to line 18 calculate the total ACE cycles of each register. The critical part is line 16. It uses the Eq. (7) for achieving low complexity. The final output is the AVF value of the whole register file. (2) To achieve the final accurate AVF, we combine the original ACE analysis with the masking rate calculation into the high-order masking effects of the unified MEA-PGM in Fig. 4. First, the ACE piece (node) information is inserted in the global graph one by one in line 1. Then, once the subgraph ‘T_local’ of the head ACE node is processed, leaving all leaf nodes as store instructions or sanitized instructionrelated ACE pieces in line 2, we can handle it with a selective

VE algorithm for its masking rate ‘MR’ of the current head ACE piece in 3. Otherwise, we continue updating the global graph by adding the new node information. Line 5 moves the head node from the global graph and repeats steps 1–5 via two loops both end (the global graph is empty) and the new AVF is output in line 8. AVF estimation by MEA-PGM is guaranteed via: r Accurate Bayesian representation via extracting the structure from the realistic benchmark trace as Fig. 2(b) provides (e.g., the dependencies of reg a related ACE pieces based on error masking spread); s Guaranteed learning to estimate the instruction level masking rate to assign the node CPTs as Fig. 2(b) shows (e.g., the CPT of node X4 is calculated by Table 1); t Guaranteed truncated reference to save the analysis time for the accelerated estimation (e.g., (1)) first-order masking effects truncated inference obviously provides a upper bound for ignoring more masking effects; (2) for highorder masking effects truncated inference, if the cycle count of X1 is very small (e.g., smaller than 10) or the corresponding graph size is very large (e.g., larger than 1000) rather five in Fig. 2(b), the truncated inference will be called. Namely, the inference is canceled and its masking rate is conservatively denoted as zero).

5. Implementation In this section, we describe the implementation setup and the simulation infrastructure in detail. Fig. 5 provides the overall view of the simulation infrastructure used in this work. Firstly, we use sim-soda [8] based on sim-alpha [21] for our basic ACE implementation (left of Fig. 5). This approach requires only one simulation to compute the AVF. FDD/TDD instruction identification is postcommit-analysis based on a 40 K size instruction chain. We add one pre-simulation to track all the instruction numbers of FDD or TDD and then, we reload the list of FDD/TDD to initialize the masking rate to 1 in main simulation. Second, the FI approach on the right of Fig. 5 is implemented on the raw alpha platform, where the md5sumcode of fault injected case (output data and address of all store instructions) is compared with that of no error case. This is repeated 1000 times. Table 2 shows the simulation configuration setup like [8]. And we use SPEC2000 (INT) benchmark suite for all results. To reduce the simulation time while still maintaining a representative program behavior, SimPoint analysis [17] is used to run each SimPoint for 50 M instructions except 20 M instructions for gap.

258

J. Jiao et al. / Microelectronics Reliability 55 (2015) 251–263

Benchmark: SPEC2000(INT)

System: Alpha 21264 sim-soda: ACE based 1) Pre-simulation: TDD/FDD instruction list

sim-alpha: FI based

Basic simulation: no fault

2) Simulation: ACE analysis based on masking effects

Simulation: random fault injection Compare the md5sum Loop

Once simulation

Simulation statistcis: AVF and total simulation time cost Fig. 5. Simulation infrastructure.

Table 2 Simulation parameters configuration. Parameter

Value

Pipeline depth Integer ALUs/multi Integer ALU/multi latency Fetch/slot/map/issue/ commit width Issue queue size Reorder buffer size Register file size Load/store queue size MSHR entries Pre-fetch MSHR Victim buffer Return address stack L1 Cache

7 4/4 1/7 4/4/4/4/11 instructions per cycle

L2 Cache TLB size Branch predictor Mis-prediction penalty

20 80 80 32 8/cache Entries 2/cache 8 entries, 1-cycle hit latency 32-entry 64 KB instruction/64 KB data, 2-way, 64B line, 3-cycle latency 2 MB, direct mapped, 64B line, 7-cycle latency 128-entry ITLB/128-entryDTLB, fullyassociative Hybrid, 4 K global + 2-level 1 K local + 4 K choice 7 cycles

6. Results analysis The section provides the experimental results of estimation accuracy and speed. On one hand, to evaluate the proposed MEAPGM (two versions: Our_MEA-PGM-FO and Our_MEA-PGM-HO3 represents the first-order and high-order masking effects truncated inference of unified MEA-PGM framework), we re-implement related works [6,8,9]. Our work is compared with the default case without considering masking effects ‘ACE’ [6], and ‘ACE-partial’ which considers partial masking [8] and ‘ACE-average’ [9] which uses the average masking rate. On the other hand, we also compare the results of two solutions with different accuracy–efficiency tradeoff:

3

In the results of Fig. 6 (corresponds to the text information of Sections 6.1 and 6.2), we set the truncated inference parameters of skipping sub-graph threshold and cycle threshold are 1000 and 0 respectively for Our_MEA-PGM-FO. Different configurations of the parameters can lead to different accuracy and runtime tradeoff. More details are in Section 6.3.

Our_MEA-PGM-FO and Our_MEA-PGM-HO of MEA-PGM, as well as analyze the truncated parameters effects on Our_MEA-PGM-HO. 6.1. Accuracy comparison To evaluate the impact of masking effects in soft error analysis, we define metric the IAR (standing for In-Accuracy Rate):

IARðmethodÞ ¼

AVF method  AVF FI AVF ACE  AVF FI

ð8Þ

IAR captures the normalized inaccuracy gap between typical ACE and FI (maximum value is 1). The metric directly reflects the accuracy improvement by considering masking effects. By using this metric, Fig. 6(a) shows that our estimation methods outperforms other ‘ACE’ methods. At first on average, Our_MEA-PGM-FO narrows the inaccuracy gap by 8.48% while Our_MEA-PGM-HO reduces the IAR value 87.28%. (1) ‘ACE-partial’ only does so by 2.6% because it only considers limited masking effects’; (2) ‘ACE-average’ also reduces the inaccuracy gap by 22.657% and is stable in each benchmark, but it is not guaranteed and it is expensive. Compared with these existing methods, Our_MEA-PGM-FO can provide a tighter upper bound of AVF via exploiting the first-order masking effects thoroughly while Our_MEA-PGM-HO achieves a nearly optimal AVF value via exploring all the possible masking effects. Then, the estimation quality of proposed MEA-PGM varies with the benchmarks. Our_MEA-PGM-FO brings the accuracy improvement to 45.96% (for ‘gap’) while Our_MEA-PGM-HO is up to 95.54% (for ‘gcc’). Such a phenomenon is reasonable for different execution behavior of benchmarks, which determines the dependencies between components (e.g., registers and ALU) during error occurrence and propagation. Finally, we can conclude the proposed MEA-PGM achieves higher estimation accuracy via characterizing the error masking and spread completely. 6.2. Estimation speed comparison Fig. 6(b) shows the comparative results of speedup over basic FI (its time cost is normalized as 1 in Eq. (9)). On average, typical ACE can be 112.26 faster than FI. Our method is about 43.87–52.01 faster than FI, which is near to the 52.68 of ‘ACE-partial’ because of one extra pre-simulation. Especially, Our_MEA-PGM-FO

259

J. Jiao et al. / Microelectronics Reliability 55 (2015) 251–263

IAR

ACE

ACE_partial

ACE_average

Our_MEA-PGM-FO

Our_MEA-PGE-HO

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 gap

bzip

vpr

parser

mcf

eon

crafty votex

gcc

twolf

perl

gzip

AVG

benchmarks

(a) 1000

speedup(x) over FI

ACE

ACE_partial

ACE_average

Our_MEA-PGM-FO

Our_MEA-PGM-HO

100

10

1

0.1 gap

bzip

vpr

parser

mcf

eon

crafty votex

gcc

twolf

perl

gzip

AVG

benchmarks

(b) Fig. 6. Comparison of accuracy and speedup for AVF estimation. (a) IAR comparison shows Our_MEA-PGM-FO brings the accuracy improvement to 45.96% (for ‘gap’) and on average 8.48% better than ‘ACE’. Our_MEA-PGM-HO reduces the IAR value by 95.54% (for ‘gcc’) and on average 87.28%. (b) Normalized speedup over FI comparison confirms the proposed MEA-PGM approach is fast and inherits the high speed of ACE analysis, 52.01 for Our_MEA-PGM-FO and 43.87 for Our_MEA-PGM-HO on average.

achieves a faster estimation for its low space complexity, up to 52.01 speedup, while Our_MEA-PGM-HO maintains 43.87 speedup due to quantifying the high-order masking effects. As a hybrid of FI and ACE, ‘ACE-average’ requires more time to finish the repeated FI experiments and ACE analysis. Its average speedup is very low, only 1.1176 because of the time consuming FI. On the surface, it should be more expensive than general FI, but the actual speedup of ACE-average over FI is more than 1 on average. In this case, the average masking rate of ACE pieces is 26.04% while the total masking rate is up to 88.7% (1-AVF(FI)) on average. The simulations with wrong outputs may be interrupted naturally because of invalid address access, while the simulations with masked errors will continue until the program execution ends successfully. Therefore, ACE-average incurs more simulations, which are stopped before the final execution, and thus may save some simulation time. This confirms that ACE based methods are faster than FI-based ones.

SpeedupðmethodÞ ¼

T FI T method

ð9Þ

Indeed, FI only allows for determining a single structure AVF, whereas ACE can use once analysis to determine multiple storage structures (register file, Cache, TLB etc). Therefore, a highly accurate ACE based method is attractive.

6.3. Truncated inferences comparison In this section, we do two comparisons: one is between the two instances of different truncated inferences; the other one is

between the different parameter configurations of one high-order masking effects truncated inference. Firstly, we compare the two versions with different truncated inferences of proposed MEA-PGM framework: Our_MEA-PGM-FO and Our_MEA-PGM-HO. From Fig. 7, we can know the two instances have different accuracy–efficiency tradeoff results. Our_MEA-PGM-FO has 8.48% accuracy improvement and 52.01 speedup; while Our_MEA-PGM-HO achieves 87.28% accuracy benefits and 43.87 speedup. That’s to say, compare the two methods: the former can estimate faster while maintaining a tighter upper bound of AVF over typical ACE analysis via exploiting the firstorder masking effects thoroughly. Instead, the latter provides a more accurate estimation while keeping rather high speed over typical ACE analysis via characterizing high-order masking effects. They can suitable for different application scenarios. For example, Our_MEA-PGM-FO can reduce the time to market of the general processor in smart phone, ipad and other personal devices. And Our_MEA-PGM-HO is useful in high reliable processor design of bank servers and aerospace platforms. Secondly, Our_MEA-PGM-HO has two threshold values of small ACE cycles and sub-graph size, which influence the estimation results as the different skipping conditions. In Fig. 7, ‘a’ represents the value of maximum sub-graph size while ‘b’ is the minimal ACE cycle count. Firstly, the case of ‘a’ equal to {100, 1000, 10,000} shows that the larger ‘a’ is set to, the higher the accuracy of the AVF estimation is. The loss in accuracy is 61.86%, 27.93% and 17.01%, respectively. Furthermore, the runtime is also increasing with the larger ‘a’. The three cases with different ‘a’ values achieve {48.99, 43.879, 22.29} speedup respectively. The comparison points that the empirical value of ‘a’ is 1000. Considering case of

260

J. Jiao et al. / Microelectronics Reliability 55 (2015) 251–263

case1(a=1000,b=1000) case4(a=100,b=0)

0.35

case2(a=1000,b=100) case5(a=1000,b=0)

case3(a=1000,b=10) case6(a=10000,b=0)

0.3

AVF

0.25 0.2 0.15 0.1 0.05 0 gap

bzip

vpr

parser

mcf

eon

crafty votex

gcc

twolf

perl

gzip

AVG

gzip

AVG

Benchmarks

(a)

runtime (seconds)

16384

case1(a=1000,b=1000) case4(a=100,b=0)

8192

case2(a=1000,b=100) case5(a=1000,b=0)

case3(a=1000,b=10) case6(a=10000,b=0)

4096 2048 1024 512 256 128 gap

bzip

vpr

parser

mcf

eon

crafty votex

gcc

twolf

perl

Benchmarks

(b) Fig. 7. Different skipping parameter impact on Our_MEA-PGM-HO. (a) AVF comparison. The stricter the threshold is, the higher the accuracy is. The accuracy loss can be down to 17.01% over FI baseline. (b) Runtime comparison. The looser the threshold is, the smaller the runtime is. The speedup can be up to 49.26 over FI.

‘a’ = 1000 (smallest value), for ‘b’ equal to {1000, 100, 10, 0} we get {49.26, 47.31, 44.96, 43.87} speedup over FI, as seen in Fig. 7(b). The overestimation in each case is 123.52%, 66.05%, 32.38% and 27.93%, respectively. The smaller values for b (like 0–10 cycles) are better for accuracy-speed tradeoffs. 7. Conclusion In the paper, we explore the masking effects from the component dependency via PGM methodology. By quantifying these masking effects, we propose MEA-PGM that can efficiently provide a tighter upper bound of AVF estimation. Compared to the current state-of-the-art used in industry and academia, the two representatives (first-order masking effects and high-order masking effects) of proposed MEA-PGM framework achieves a higher accuracy and faster speed in flexibility: on average 8.48% accuracy improvement with 52.01 speedup, and 87.28% accuracy improvement with 43.87 speedup respectively. The two instances of MEA-PGM have different accuracy–efficiency results and can adapt to different application scenarios very well. Appendix A This section provides all the Supplementary materials for our work. First, the complete computing model of instruction-level masking rate is listed in Section A.1. Then, some more the complexity derivation for first-order masking effects based truncated inference is given in Section A.2. A.1. Complete instruction level masking model in Table A1 The Table A1 gives the complete instruction level masking model for the register file in alpha processor. It lists all the cases of possible masking effects and the corresponding masking rate

in our work. Such an enumeration method is based on the processor instruction set architecture and can be applied in arbitrary type processor. A.2. Derivation of complexity reduction for the first-order truncated policy

8 ! n n X Y > > > > ð10Þ T ðnÞ ¼ T i 1  M inst s > < ACE i¼0 s¼i ! Giv en nþ1 nþ1 > X Y > > > T ACE ðn þ 1Þ ¼ ð11Þ T 1  M > i inst s : i¼0

s¼i

First is to do the operation:

ð11Þ  ð10Þ  ð10Þ  M inst

ðnþ1Þ

þ ð10Þ  M inst

ðnþ1Þ ;

Then, the left is:

ð11Þ  ð10Þ ¼ T ACE ðn þ 1Þ  T ACE ðnÞ; And the right is:

ð11Þ  ð10Þ  M inst

ðnþ1Þ

þ ð10Þ  M inst !

nþ1 nþ1 X Y ¼ T i  1  M inst i¼0

s

 M inst

ðnþ1Þ

n n X Y T i  1  M inst ðnþ1Þ 

s¼i

 ð1  M inst

i¼0

ðnþ1Þ Þ

i¼0

þ M inst

s

s¼i

n X Ti ðnþ1Þ 

s¼i

ðnþ1Þ

!

 T ACE ðnÞ

nþ1 nþ1 nþ1 X X Y ¼ Ti  T i  M inst s  Minst i¼0

 ð10Þ



!

i¼0

n X

n Y

i¼0

s¼i

Ti 

! Minst

s

 ð1  Minst

ðnþ1Þ Þ

 T ACE ðnÞ

Table A1 Complete computing model of instruction-level masking rate for register file. Type

Computing policy

Typical example: alpha processor Instruction

Opcode masking effects

Computed by the bit extension percent in the total width of the operand

Notation

Function

Register a

Register b

AND ORNOT BIC BIS ANDI ORNOTI BICI BISI BLBC CMOVLBC BLBS CMOVLBS BGT

c=a & b c = a|  b c=a& b c = a|b c = a & IMM c = a|  IMM c = a &  IMM c = a|IMM !(a& 0  1) !(a& 0  1) a & 01 a & 01 a>0

1  Nb/W 1  Nb/W Nb/W Nb/W 1  Ni/W

1  Na/W Na/W 1 – Na/W Na/W

CMOVGT BGE CMOVGE BLT CMOVLT BLE CMOVLE CMOVNE

a>0 aP0 aP0 a<0 a<0 a60 a60 a !=0

CMOVEQ BNE BEQ CMPEQ

a == 0 a !=0 a == 0 c = (a == b)



ADDL

c = (a + b)  (W/2)

1/2 – 1/W

1/2 – 1/W

SUBL ADDLI SUBLI S4ADDL S4ADDLI S4SUBL S4SUBLI S4ADD S4ADDI S4SUB S4SUBI S8ADDL S8ADDLI S8SUBL S8SUBLI S8ADD S8ADDI S8SUB S8SUBI

c =(a – b)  (W/2) c = (a + IMM)  (W/2) c = (a – IMM)  (W/2) c = (a ⁄ 4 + b)  (W/2) c = (a ⁄ 4 + IMM)  (W/2) c = (a ⁄ 4 – b)  (W/2) c = (a ⁄ 4 – IMM)  (W/2) c=a⁄4+b c = a ⁄ 4 + IMM c=a⁄4 – b c = a ⁄ 4 – IMM c = (a ⁄ 8 + b)  (W/2) c = (a ⁄ 8 + IMM)  (W/2) c = (a ⁄ 8 – b)  (W/2) c = (a ⁄ 8 – IMM)  (W/2) c=a⁄8+b c = a ⁄ 8 + IMM c=a⁄8 – b c = a ⁄ 8 – IMM

1/2 + 1/W

1/2 – 1/W

2/W

0

1/2 + 2/W

1/2 – 1/W

3/W

0

W = width of a register Na = number of ‘1’ in a Nb = number of ‘1’ in b Nab = number of ‘1’ in (a–b)

1 – 1/W



1  1=W 0

8 <1 1  1=W : 0

1 0

when ðN a ! ¼ 0Þ others

when ðN a > 1Þ when ðN a ¼¼ 1Þ others

when ðN ab > 1Þ others



1 0

when ðN ab > 1Þ others W/2 is the valid width and 1 is the sign bit. The extra bit extension for a is 2 at S4ADDL

261

(continued on next page)

J. Jiao et al. / Microelectronics Reliability 55 (2015) 251–263

Bit extension masking effects

Estimated via making good use of the number of ‘1’ in the instruction operands

Computing model of Minst

Name

262

J. Jiao et al. / Microelectronics Reliability 55 (2015) 251–263

Using the post-commitanalysis to identify FDD/ TDD and the immediate identification for NOP/ Prefetch

Notation

The bit extension is W– ($b%64) for a and b is W-6

¼

nþ1 n nþ1 X X Y Ti  T i  M inst s  M inst i¼0

i¼0

¼

ðnþ1Þ



n X Ti

s¼i

i¼0

n nþ1 X Y þ T i  Minst s  ð1  M inst i¼0

 T nþ1  Minst

ðnþ1Þ

ðnþ1Þ Þ

 T ACE ðnÞ

s¼i

nþ1 X T i  Minst

ðnþ1Þ

 ðT nþ1 þ

i¼0

n n nþ1 n X X Y X TiÞ  T i  Minst s þ Ti i¼0

nþ1 Y  M inst s  ð1  M inst

ðnþ1Þ Þ

i¼0

s¼i

i¼0

 T ACE ðnÞ

¼ ð1  M inst

1–6/W

¼ ð1  M inst



nþ1 X T i  ð1  M inst

ðnþ1Þ Þ

 T ACE ðnÞ

! nþ1 X T i  T ACE ðnÞ ; ðnþ1Þ Þ  i¼0

1  ($b%64 – 1)/W

Finally, we can get:

DT ACE ðn þ 1Þ ¼ T ACE ðn þ 1Þ  T ACE ðnÞ ¼ ð1  Minst

nþ1 X T i  T ACE ðnÞ ðnþ1Þ Þ 

!

i¼0

1

Register a

Computing model of Minst

ðnþ1Þ Þ

i¼0

1

Register b

s¼i

FDD/TDD instructions

NOP and Prefetch instructions

c = a  (b%64) c = a  (b%64) SRL SLL

Determined by instruction dependency

c = a  (b%64) SRA

Sanitized instructions based masking effects

Function Name

Typical example: alpha processor Computing policy Type

Table A1 (continued)

Instruction

References [1] Baumann RC. Radiation-induced soft errors in advanced semiconductor technologies. Dev Mater Reliab IEEE Trans 2005:305–16. [2] Wang Nicholas J, Mahesri Aqeel, Patel Sanjay J. Examining ace analysis reliability estimates using fault-injection. SIGARCH Comput Architect News 2007. [3] Alexandrescu D. A comprehensive soft error analysis methodology for SoCs/ ASICs memory instances. In: On-line testing symposium (IOLTS), vol. 176; 13– 15 July 2011. p. 175. [4] George NJ, Elks CR, Johnson BW, Lach J. Transient fault models and AVF estimation revisited. Dependable Syst Networks 2010;4:477–86. [5] Mukherjee SS, Weaver C, Emer J, Reinhardt SK, Austin T. A systematic methodology to compute the architectural vulnerability factors for a highperformance microprocessor. MICRO-36 2003:29–40. [6] Biswas A, Racunas P, Cheveresan R, Emer J, Mukherjee SS, Rangan R. Computing architectural vulnerability factors for address based structures. In: Proceedings: international conference computer architecture (ISCA); 4–8 June 2005. p. 532, 543. [7] Biswas A, Racunas P, Emer J, Mukherjee SS. Computing accurate AVFs using ACE analysis on performance models: a rebuttal. Comput Architect Lett 2008;7(1):21–4. [8] Fu Xin, Tao Li, José Fortes. Sim-soda: a unified framework for architectural level software reliability analysis. Workshop Modeling Benchmarking and Simulation 2006. [9] Haghdoost A, Asadi H, Baniasadi A. System-level vulnerability estimation for data caches. Dependable Comput (PRDC) 2010;13–15:157–64. [10] Walcott KR, Humphreys G, Gurumurthi S. Dynamic prediction of architectural vulnerability from microarchitectural state. ACM SIGARCH Comput Architect News 2007;35(2):516–27. [11] Bin Li, Lide Duan, Lu Peng. Efficient microarchitectural vulnerabilities prediction using boosted regression trees and patient rule inductions. Comput IEEE Trans 2010;59(5):593–607. [12] Nair AA, Eyerman S, Eeckhout L, John LK. A first-order mechanistic model for architectural vulnerability factor. In: Proceedings: international conference computer architecture (ISCA); 2012. p. 273, 284. [13] Jianjun Xu, QingPing Tan, Wanwei Liu. Estimating the soft error vulnerability of register files via interprocedural data flow analysis. Theo Aspects Software Eng (TASE) 2010;25–27:201–8. [14] Praveen A, Sai Kumar N. To improve register file integrity against soft errors by using self-immunity technique. Int J Latest Trends Eng Technol 2013:207–12. [15] Blome JA, Gupta S, Feng S, Mahlke S. Cost-efficient soft error protection for embedded microprocessors. In: Conference on compilers, architecture and synthesis for embedded systems; October 2006. p. 421,431. [16] Jongeun Lee, Shrivastava A. Static analysis to mitigate soft errors in register files. DATE 2009:1367–72. [17] Sherwood T, Perelman E, Hamerly G, Calder B. Automatically characterizing large scale program behavior. SIGARCH Comput Architect News 2002:45–57. [18] Rajaramant R, Kim JS, Vijaykrishnan N, Xie Y, Irwin MJ. SEAT-LA: a soft error analysis tool for combinational logic. VLSI Des 2006:3–7.

J. Jiao et al. / Microelectronics Reliability 55 (2015) 251–263 [19] Ebrahimi M, Liang Chen, Asadi H, Tahoori MB. CEP: correlated error propagation for hierarchical soft error analysis. J Electron Testing 2013:143–58. [20] George N, Lach J. Characterization of logical masking and error propagation in combinational circuits and effects on system vulnerability. Dependable systems & networks, international conference on 27–30 June 2011. p. 323, 334. [21] Desikan R, Burger D, Keckler SW, Austin T. Sim-alpha: a validated, executiondriven alpha 21264 simulator. Tech Rep 2001:TR-01–23. [22] Constantinides K, Plaza S, Blome I, Bin Z, Bertacco V, Mahlke S, et al. BulletProof: a defect-tolerant CMP switch architecture. High-Performance Comput Arch 2006:3–14. [23] Rao R, Chopra K, Blaauw D, Sylvester D. An efficient static analysis algorithm for computing the soft error rates of combinational circuits. Proc DATE 2006:164–9. [24] Miskov-Zivanov N, Marculescu D, MARS-C: modeling and reduction of soft errors in combinational circuits. In: Proceedings of DAC; 2006. [25] Maniatakos, Michail, Chandra Tirumurti, Abhijit Jas, Yiorgos Makris. AVF analysis acceleration via hierarchical fault pruning. In: European test symposium (ETS), 2011 16th IEEE; 2011. p. 87–92. [26] Sridharan V, Kaeli DR. Quantifying software vulnerability. In: Proceedings of the 2008 workshop on Radiation effects and fault tolerance in nanometer technologies. ACM; 2008. p. 323–328.

263

[27] Azarpeyvand, Ali, Salehi Mostafa E, Fakhraie Sied Mehdi. An analytical method for reliability aware instruction set extension. J Supercomput 2014;67(1):104–30. [28] Borodin D, Juurlink BH. Protective redundancy overhead reduction using instruction vulnerability factor. In: Proceedings of the 7th ACM international conference on computing frontiers. ACM; 2010. p. 319–326. [29] Li J, Xue J, Xie X, Wan Q, Tan Q, Tan L. Epipe: a low-cost fault-tolerance technique considering WCET constraints. J Syst Architect 2013. [30] Rehman S, Shafique M, Kriebel F, Henkel J. Reliable software for unreliable hardware: embedded code generation aiming at reliability. In: Proceedings of the seventh IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis. ACM: 2011. p. 237–246. [31] Shafique M, Rehman S, Aceituno PV, Henkel J. Exploiting program-level masking and error propagation for constrained reliability optimization. In: Proceedings of the 50th annual design automation conference. ACM; 2013. p. 17. [32] Iyer RK, Nakka NM, Kalbarczyk ZT, Mitra S. Recent advances and new avenues in hardware-level reliability support. Micro IEEE 2005;25(6):18–29. [33] Li, Xiaodong, Adve Sarita V, Bose Pradip, Rivers Jude A. SoftArch: an architecture-level tool for modeling and analyzing soft errors. Dependable systems and networks. IEEE; 2005.