A distributed minority and majority voting based redundancy scheme

A distributed minority and majority voting based redundancy scheme

MR-11707; No of Pages 6 Microelectronics Reliability xxx (2015) xxx–xxx Contents lists available at ScienceDirect Microelectronics Reliability journ...

830KB Sizes 0 Downloads 43 Views

MR-11707; No of Pages 6 Microelectronics Reliability xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Microelectronics Reliability journal homepage: www.elsevier.com/locate/mr

A distributed minority and majority voting based redundancy scheme P. Balasubramanian ⁎, D.L. Maskell School of Computer Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore

a r t i c l e

i n f o

Article history: Received 25 May 2015 Received in revised form 3 July 2015 Accepted 5 July 2015 Available online xxxx Keywords: N-modular redundancy (NMR) Reliability Fault tolerance Majority voters CMOS Standard cells

a b s t r a c t This article presents a new distributed minority-cum-majority voting based redundancy (DMMR) scheme that is scalable and has the ability to tolerate multiple function module faults/failures. In comparison with 5-tuple, 7tuple, and 9-tuple versions of the N-modular redundancy scheme, the proposed DMMR scheme whilst being equally fault-tolerant reports respective improvements in design metrics (i.e., figure-of-merit) by 32.5%, 180.4% and 377.4% for example ASIC-based implementations utilizing a 4 × 4 array multiplier as the representative function module, whilst reporting very small corresponding decreases in system reliability by just 1.21%, 1.06% and 1.08% for consideration of module reliabilities ranging from 0.9 to 0.99. The simulation results are obtained using a 32/28 nm CMOS process. © 2015 Elsevier Ltd. All rights reserved.

1. Introduction Mission and safety-critical applications, by default, employ redundancy in system design to overcome any arbitrary function module fault(s)/failure(s) to provide correct and uninterrupted operation [1]. The function module deployed could be any circuit/system that is duplicated as per need to construct a redundant system. In a passive Nmodular redundant system, (N − 1) copies of a function module are used, and at least a majority M out of the N function modules is required to operate correctly to guarantee the mission-success [2,3]. In other words, the fault-tolerant N-modular redundant system is capable of accommodating maximum of (N − M) function module failures. All the N function modules are joined using a voting element and the voter produces a majority vote based on the correct operation of M out of N function modules. Triple modular redundancy (TMR), which is a 3-tuple version of the generic NMR, deploys 3 function modules and at least any 2 of the 3 function modules are expected to operate correctly, i.e., the TMR can cope with only 1 function module fault/failure. To cope with scenarios where multiple function module faults or failures are likely to occur, such as the possibility of multiple fault occurrences in combinational and/or sequential systems due to radiation phenomena [4,5], higher versions of the NMR such as 5MR (which is a 5-tuple version of NMR), 7MR (which is a 7-tuple version of NMR), or even 9MR (which is a 9tuple version of NMR) may be used selectively [6]. However, the main drawback of higher order NMR systems such as 5MR, 7MR and 9MR is the exacerbated increase in design metrics viz. power, delay, and area. ⁎ Corresponding author. E-mail address: [email protected] (P. Balasubramanian).

To mitigate the large design overheads of higher order redundant systems, this paper proposes a new distributed minority and majority voting based redundancy scheme, called the DMMR. The DMMR is scalable and is found to be efficient in terms of design parameters for comparisons of sample ASIC-based implementations of 5MR, 7MR and 9MR designs with counterpart DMMR designs. In the rest of this paper, Section 2 describes the conventional NMR and proposed DMMR system architectures through block schematics. The system reliability equations of 5MR, 7MR, 9MR, and the proposed 3-of-5, 3-of-6 and 3-of-7 DMMR systems are given in Section 3. The system reliability curves of various systems are plotted as a function of their module reliabilities. Section 4 presents the simulation results obtained for sample implementations of 5MR, 7MR, 9MR, 3-of-5 DMMR, 3-of-6 DMMR and 3-of-7 DMMR systems by considering a 4 × 4 array multiplier as a representative function module. Lastly, Section 5 provides the conclusions. 2. Illustration of NMR and DMMR schemes The generalized block schematics of NMR and DMMR schemes are portrayed by Fig. 1a and b respectively. In the NMR system, (N − 1) copies of a function module i.e., N identical function modules are used and the majority voter combines the outputs of the N function modules viz. M1 to MN to produce a majority vote (NV) indicating the correct NMR system operation. In order to satisfy the majority logic, at least (N + 1)/2 out of the N function modules should maintain the correct operation, where N is odd. In Fig. 1b, F1 up to FN represents the outputs of the function modules 1 to N. According to the proposed DMMR scheme, the function modules are split into two groups, which are highlighted as the portions enclosed

http://dx.doi.org/10.1016/j.microrel.2015.07.015 0026-2714/© 2015 Elsevier Ltd. All rights reserved.

Please cite this article as: P. Balasubramanian, D.L. Maskell, A distributed minority and majority voting based redundancy scheme, Microelectronics Reliability (2015), http://dx.doi.org/10.1016/j.microrel.2015.07.015

2

P. Balasubramanian, D.L. Maskell / Microelectronics Reliability xxx (2015) xxx–xxx

(a)

(b) Fig. 1. Block diagram representation of (a) NMR, and (b) the proposed DMMR scheme. (For interpretation of the references to colour in this figure, the reader is referred to the web version of this article.)

within the red and blue dotted lines in Fig. 1b, named as majority logic and minority logic respectively. The outputs of function modules 1, 2 and 3 viz. F1, F2 and F3 are fed to a 3-input majority voter, which implements the sum-of-products F1F2 + F2F3 + F1F3 using a single complex gate viz. the AO222 cell, shown using dotted lines in Fig. 1b. Function modules 4 to N (where N is a positive integer, and N N 3) are considered separately and their outputs are fed to an OR gate, which implements the logical disjunction: F4 + F5 + … + FN. The majority logic requires that any 2 out of the 3 function modules should operate correctly, i.e., the failure of any one arbitrary function module is alone tolerated. The minority logic requires at least 1 arbitrary function module to maintain the correct operation. In a DMMR system, as a minimum, 4 function modules are used with 3 function modules constituting the majority logic, and the remaining function modules constituting the minority logic. When N function modules are employed in a generic 3-of-N DMMR system, 3 function modules would comprise the majority logic and the remainder of the function modules viz. (N − 3) would comprise

the minority logic. The respective outputs of the two logic groups of function modules (i.e., majority logic and minority logic) viz. MAJ and MIN are combined using a 2-input AND gate that produces the system/voter output (DV), as indicated in Fig. 1b. The DMMR voter, shown within the green dotted rectangle in Fig. 1b, is not only fault-tolerant but also incorporates inherent error correction capability. To explain these, let us consider two example scenarios for illustration with reference to Fig. 1b. Let us presume that the expected (correct) values of F1 up to FN are 1, subject to the application of specific inputs. Under this scenario, MIN and MAJ would equate to 1, and the correct voter output DV = MAJ · MIN = 1. Supposing multiple function modules become faulty or fail, for example, assuming that function module 1, and function modules 5 up to N become faulty/fail and as a result their outputs get corrupted, F1, and F5 up to FN would become 0 instead of 1, whilst function modules 2, 3 and 4 would alone retain the correct value of 1. Still the internal voter outputs MAJ and MIN would

Please cite this article as: P. Balasubramanian, D.L. Maskell, A distributed minority and majority voting based redundancy scheme, Microelectronics Reliability (2015), http://dx.doi.org/10.1016/j.microrel.2015.07.015

P. Balasubramanian, D.L. Maskell / Microelectronics Reliability xxx (2015) xxx–xxx

equate to 1, and the DMMR voter output is therefore retained as 1 despite multiple function module faults/failures. This example scenario could be generalized — if the majority of the function modules in the majority logic group operate correctly and even if all but one of the function modules in the minority logic group become faulty/fail, the DMMR voter would still produce the correct output. Let us now presume that the expected (correct) values of F1 up to FN are 0 for a certain input pattern applied. Given this, both MAJ and MIN would equate to 0 and the DMMR voter would output 0 on DV. Supposing due to simultaneous faults or failures of the function modules, let us assume that function modules 1 and 4 output 1 on F1 and F4 erroneously. As a result, MIN would become 1 due to the error propagation. However, with MAJ still being 0 due to the Boolean majority, the DMMR voter would produce the correct output of 0 on DV. Notice at this juncture that even when all the function modules corresponding to the minority logic may become faulty/fail to produce the correct output, since MAJ is 0 as F2 and F3 are 0 s, the DMMR voter would manage to produce the correct output of 0 through DV. Thus, it may be understood that when the expected system output is 0, even though all or some of the function modules in the minority logic group may become faulty/fail, the DMMR voter would not be affected, and an incorrect output is not propagated or produced by the DMMR voter/system, provided the majority logic group output is correct. This scenario can also be generalized, i.e., if the expected DMMR voter output is 0, subject to the application of specific inputs, this can be accomplished with just the majority logic group alone operating correctly even if the minority logic group may fail. 3. System reliabilities of NMR and DMMR schemes Let RM specify the reliability (probability) of the correct operation of a function module (M), and hence (1 − RM) would specify the incorrect operation/failure state of the function module M. It is implicit that R = R(t) in the equations, i.e., reliability is expressed as a function of time (t). Assuming the reliabilities of multiple function modules used in redundant systems to be equivalent, since the function modules are identical, and further assuming that the voters are perfect, with RS denoting the system reliability, the reliability expressions of simplex, 5MR, 7MR and 9MR systems are given by Eqs. (1) to (4). Eqs. (2), (3) and (4) given below explicitly refer to the majority clauses in the cases of 5MR, 7MR and 9MR. RS Simplex ¼ RM

ð1Þ

RS 5MR ¼ 10RM 3 ð1–RM Þ2 þ 5RM 4 ð1–RM Þ þ RM 5

ð2Þ

RS 7MR ¼ 35RM 4 ð1–RM Þ3 þ 21RM 5 ð1–RM Þ2 þ 7RM 6 ð1–RM Þ þ RM 7

ð3Þ

RS 9MR ¼ 126RM 5 ð1–RM Þ4 þ 84RM 6 ð1–RM Þ3 þ 36RM 7 ð1–RM Þ2 þ 9RM 8 ð1–RM Þ þ RM 9

3

Fig. 2. System reliabilities of simplex and some NMR and DMMR systems versus their module reliabilities.

function modules in the minority logic group. Therefore, the system/ voter output equation of the DMMR scheme is given as,



DV ¼ ð F1 F2 þ F2 F3 þ F1 F3 Þ ð F4 þ F5 þ … þ FN Þ:

ð5Þ

The system reliabilities of the proposed 3-of-5, 3-of-6 and 3-of-7 DMMR systems are expressed by Eqs. (6), (7) and (8) respectively. In the 3-of-5, 3-of-6 and 3-of-7 DMMR systems, 3 function modules are configured for majority logic. However, with respect to minority logic, 2 function modules are configured for minority logic in the 3-of-5 DMMR, 3 function modules are configured for minority logic in the 3of-6 DMMR, and an extra function module is configured for minority logic in the 3-of-7 DMMR system. RS 3‐of‐5 DMMR ¼ 6RM 3 ð1–RM Þ2 þ 5RM 4 ð1–RM Þ þ RM 5

ð6Þ

RS 3‐of‐6 DMMR ¼ 9RM 3 ð1–RM Þ3 þ 12RM 4 ð1–RM Þ2 þ 6RM 5 ð1–RM Þ þ RM 6 ð7Þ RS 3‐of‐7 DMMR ¼ 12RM 3 ð1–RM Þ4 þ 22RM 4 ð1–RM Þ3 þ 18RM 5 ð1–RM Þ2 þ 7RM 6 ð1–RM Þ þ RM 7

ð8Þ

The reliability curves of simplex, 5MR, 7MR, 9MR and the proposed 3-of-5 DMMR, 3-of-6 DMMR, and 3-of-7 DMMR systems are depicted by Fig. 2. Let us consider typical module reliabilities ranging from 0.9 to 0.99 to compare and comment on the average system reliabilities of various redundant and non-redundant schemes. Given this, the 3-of-5 DMMR reports a 4.3% increase in system reliability compared to the simplex scheme whilst reporting only 1.21% decrease in system reliability compared to the 5MR. The 3-of-6 DMMR, on the other hand, exhibits

ð4Þ

Eqs. (1) to (4) imply that: i) the reliability of the simplex system is wholly governed by its module reliability, ii) the 5MR system reliability is subject to the correct operation of at least 3 constituent function modules or any combination of 4 function modules or the correct operation of all the 5 function modules, iii) the 7MR system reliability is dependent upon the correct working of at least 4 function modules or is dependent upon the correct working of any 5 or 6 or all of the 7 function modules, and iv) the 9MR system reliability is dictated by the correct working of at least 5 constituent function modules or is dependent upon the correct working of any 6 or 7 or 8 or all of the 9 function modules. In contrast, the reliability of the DMMR system is governed by the correct working of a majority of the function modules constituting the majority logic group, and the correct working of at least one of the

Fig. 3. Schematic of 4 × 4 Braun array multiplier.

Please cite this article as: P. Balasubramanian, D.L. Maskell, A distributed minority and majority voting based redundancy scheme, Microelectronics Reliability (2015), http://dx.doi.org/10.1016/j.microrel.2015.07.015

4

P. Balasubramanian, D.L. Maskell / Microelectronics Reliability xxx (2015) xxx–xxx

(a)

(b)

(c) Fig. 4. Majority voters of: (a) 5MR, (b) 7MR, and (c) 9MR.

Please cite this article as: P. Balasubramanian, D.L. Maskell, A distributed minority and majority voting based redundancy scheme, Microelectronics Reliability (2015), http://dx.doi.org/10.1016/j.microrel.2015.07.015

P. Balasubramanian, D.L. Maskell / Microelectronics Reliability xxx (2015) xxx–xxx

5

a 4.6% improvement in system reliability compared to the simplex scheme whilst experiencing just a 1.06% decrease in system reliability compared to the 7MR. The 3-of-7 DMMR tends to be more reliable than the simplex system by 4.7%, whilst being less reliable than the 9MR by only 1.08%. The simplex system might become a single point of failure during critical fault occurrences. The 3-of-5 DMMR can tolerate maximum of 2 function module faults/failures as that of the 5MR, the 3-of-6 DMMR can cope with 3 function module faults/failures as that of the 7MR whilst requiring 1 function module less, and the 3-of-7 DMMR can withstand up to 4 function module faults/failures as that of the 9MR whilst requiring 2 function modules less. 4. Example implementation of NMR and DMMR schemes Example ASIC-based implementations of 5MR, 7MR, 9MR, 3-of-5 DMMR, 3-of-6 DMMR, and 3-of-7 DMMR systems have been considered according to the block schematics shown in Fig. 1. The 4 × 4 Braun array multiplier [7], portrayed by Fig. 3, has been used representatively for the function modules. The 4 × 4 array multiplier consists of 8 primary inputs and 8 primary outputs and is implemented using 8 full adders, 4 half adders, and 16 2input AND gates. The full adder and half adder elements of the cell library [8] have been used for the multiplier synthesis, and the array multiplier consumes 84.38 μm2 of Silicon area. The majority voters corresponding to 5MR, 7MR and 9MR systems, shown in Fig. 4, occupy areas of 13.47 μm2, 34.31 μm2 and 63.79 μm2 respectively. A multiplexer-based synthesis strategy was preferred for synthesizing the voting logic of 5MR, 7MR and 9MR systems, since such a synthesis method is reported to be an optimum choice for NMR system voter synthesis [9]. In Fig. 4, A, B, C, D, E, F, G, H and I represent the function modules' outputs/majority voters' inputs, and X, Y, and Z denote the voters' outputs. The simulation results viz. power, delay, and area obtained for the sample realization of various redundant and non-redundant systems are shown in Table 1. For power estimation, the 4 × 4 array multiplier used for the function modules was supplied with all possible inputs viz. 256 distinct input vectors, which represents the unique multiplication scenarios. The input vectors were supplied at time intervals of 2.5 ns (400 MHz) through test benches which signify the inputs coming in from the outside world. The .vcd files generated through the simulations were subsequently used for average power estimation using Synopsys PrimeTime. The area and critical path delay metrics were also estimated and are given in Table 1. To comprehensively evaluate the design metrics of different systems implementations, a figure-ofmerit (FOM) is defined as the inverse of the product of power, delay and area (PDAP−1) in Table 1. Since minimization of power, delay and area is desirable, a lower PDAP value and thus a higher FOM value can be considered to be an indicator of optimized design. The calculated FOM values are also given in Table 1. Referring to Table 1, a comparison of 5MR with the proposed 3-of-5 DMMR reveals that the latter reports an increase in FOM by 32.5% whilst being able to tolerate up to 2 function module faults or failures as that of

Fig. 5. Split-up of power dissipation of function modules and voters in different NMR and DMMR systems.

the former. A comparison between 7MR and the proposed 3-of-6 DMMR shows that the latter exhibits a significant improvement in FOM by 180.4% compared to the former. This is because the 3-of-6 DMMR requires 1 function module less than the 7MR. Note that in addition to the significant advantage gained by the 3-of-6 DMMR over the 7MR with respect to FOM; both these systems have the same capacity to withstand 3 function module faults or failures. A comparison between 9MR and the proposed 3-of-7 DMMR reveals that the latter leads to a substantial improvement in FOM by 377.4%, despite featuring the ability to cope with 4 function module faults/failures as that of the former. The split-up of power dissipated by the function modules and voters in NMR and the proposed DMMR systems is shown in Fig. 5. It can be seen that the increase in voters' power dissipation is significant in the case of NMR systems compared to the proposed DMMR systems. On average, the majority voters of the different NMR systems (i.e., 5MR, 7MR and 9MR) account for about 20% of the system power dissipation, whilst the voters belonging to the proposed DMMR systems, in contrast, contribute to only 6.7% of the total system power dissipation. This reduced power dissipation of the DMMR voters is directly attributable to their less area occupancy, as highlighted by Fig. 6. Fig. 6 exclusively highlights the Silicon area occupied by the voting logic of different NMR and DMMR systems. It is evident from Fig. 6 that the voters' area nearly doubles with every higher order NMR system realization, whilst the increase in voter area for successive DMMR systems is relatively meagre. This is in fact a unique advantage of the proposed DMMR system architecture. 5. Conclusions This article has presented a new, generic DMMR system design scheme that helps to pave the way for optimization of design metrics in comparison with the conventional design of majority-based NMR systems. In specific, a 3-of-5 DMMR, a 3-of-6 DMMR, and a 3-of-7 DMMR

Table 1 Power, delay and area estimates of 5MR, 3-of-5 DMMR, 7MR, 3-of-6 DMMR, 9MR, and 3of-7 DMMR system implementations, considering the 4 × 4 array multiplier as a representative function module. System configuration

Power (μW)

Delay (ns)

Area (μm2)

FOM ×106

5MR 3-of-5 DMMR 7MR 3-of-6 DMMR 9MR 3-of-7 DMMR

120.7 109.3 191.2 129.4 278.5 151.2

0.98 0.90 1.12 0.90 1.23 0.91

529.64 480.84 865.11 567.25 1269.7 661.79

15.96 21.14 5.40 15.14 2.30 10.98

Fig. 6. Area occupancies of voters corresponding to different NMR and DMMR systems.

Please cite this article as: P. Balasubramanian, D.L. Maskell, A distributed minority and majority voting based redundancy scheme, Microelectronics Reliability (2015), http://dx.doi.org/10.1016/j.microrel.2015.07.015

6

P. Balasubramanian, D.L. Maskell / Microelectronics Reliability xxx (2015) xxx–xxx

system were realized and compared vis-à-vis a 5MR, a 7MR, and a 9MR system implementation. The results obtained show that the proposed DMMR systems enable significant reductions in design metrics compared to their counterpart NMR systems, albeit at the expense of very small decreases in reliability but with no trade-off in terms of fault/failure tolerance. The proposed DMMR system architecture garnered a mean improvement in FOM by 196.8% compared to the NMR system architecture, for the sample implementations considered, at the expense of a very small average reduction in system reliability by only 1.12%. Also, it is worth noting here that with every additional function module introduced in the case of the DMMR system, its fault/failure tolerance also increases by a similar measure. This is in contrast to the NMR system, where 2 function modules have to be added in order to enhance its fault/failure tolerance capability by unity. Thus the proposed DMMR system architecture provides an efficient and viable alternative to conventional NMR-based fault-tolerant design of safety-critical circuits

and systems. Implementation of the proposed DMMR system architecture on a real-time mission-critical system is left for future work.

References [1] B.W. Johnson, Design and Analysis of Fault-tolerant Digital Systems, Addison-Wesley Publishing Company, USA, 1989. [2] I. Koren, C. Mani Krishna, Fault-tolerant Systems, Morgan Kaufmann Publishers, 2007. [3] E. Dubrova, Fault-tolerant Design, Springer, New York, USA, 2013. [4] N. Miskov-Zivanov, D. Marculescu, Multiple transient faults in combinational and sequential circuits: a systematic approach, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 29 (2010) 1614–1627. [5] H. Quinn, P. Graham, J. Krone, M. Caffrey, S. Rezgui, Radiation-induced multi-bit upsets in SRAM-based FPGAs, IEEE Trans. Nucl. Sci. 52 (2005) 2455–2461. [6] T. Ban, L. Naviner, Progressive modular redundancy for fault-tolerant designs in nanoelectronics, Microelectron. Reliab. 51 (2011) 1489–1492. [7] M.M. Vai, VLSI Design, CRC Press, USA, 2000. [8] Synopsys SAED_EDK32/28_CORE Databook, 2012. [9] B. Parhami, Voting networks, IEEE Trans. Reliab. 40 (1991) 380–394.

Please cite this article as: P. Balasubramanian, D.L. Maskell, A distributed minority and majority voting based redundancy scheme, Microelectronics Reliability (2015), http://dx.doi.org/10.1016/j.microrel.2015.07.015