Choice of granularity for reliable circuit design using dynamic reconfiguration

Choice of granularity for reliable circuit design using dynamic reconfiguration

MR-11981; No of Pages 13 Microelectronics Reliability xxx (2016) xxx–xxx Contents lists available at ScienceDirect Microelectronics Reliability jour...

3MB Sizes 3 Downloads 89 Views

MR-11981; No of Pages 13 Microelectronics Reliability xxx (2016) xxx–xxx

Contents lists available at ScienceDirect

Microelectronics Reliability journal homepage: www.elsevier.com/locate/mr

Choice of granularity for reliable circuit design using dynamic reconfiguration Atin Mukherjee ⁎, Anindya Sundar Dhar Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology Kharagpur, India

a r t i c l e

i n f o

Article history: Received 16 September 2015 Received in revised form 11 April 2016 Accepted 11 April 2016 Available online xxxx Keywords: Fault tolerance Granularity Area optimization Modular redundancy Dynamic reconfiguration

a b s t r a c t While designing fault tolerant systems using dynamic reconfiguration, choice regarding the size of the granule influences the area, the power and the delay overheads. In this paper, attempt has been made to determine the optimum granule size that would incur minimum overhead vis-à-vis other design parameters such as the number of faults to be tolerated etc. In order to facilitate the design process, mathematical expressions have been provided showing the relationships among the area of single granule, the number of the external connections, the area of the reconfiguration multiplexers and the probability of failure of the system. Optimum granule-sizes in designing various fault tolerant circuits from ripple carry adder to CORDIC as well as Viterbi decoder have been derived. © 2016 Published by Elsevier Ltd.

1. Introduction The ongoing miniaturization through shrinking the device dimensions not only increases the packing density, but also elevates the probability of failure of the system due to internal faults such as gate dielectric breakdown and electromigration or for external influences, such as radiation induced defects in space applications. Due to the high density of transistors and complex topology used in modern technologies, failure rates have been increased and hence incorporation of fault tolerance has become essential in the design of various systems catering to different applications for ensuring reliable and fault-free operations. For resource-constrained systems where the amount of hardware devoted for active computing must be maximized, dynamic reconfiguration is the preferred fault tolerant technique [1–3] that uses a fault detection and reconfiguration unit to identify the faulty modules and replaces them with fault-free spares. For static redundancy methods, all the spares are active along with the normally working modules and hence more power is consumed compared to the dynamic reconfiguration method, where the backup units become operational only upon detection of faults in some active modules. Popular static methods like triple modular redundancy (TMR) [1], multiplexing technique [4], quadded logic (QL) [5], quadded transistor (QT) [6] etc. require at least three times of the original hardware needed in case of the non-redundant ones. Hence for critical applications such as in satellites and avionics where increase in payload is a major concern, dynamic reconfiguration technique gets priority over the static methods. The generalized idea for ⁎ Corresponding author. E-mail addresses: [email protected] (A. Mukherjee), [email protected] (A.S. Dhar).

self-repair including testing and reconfiguration in case of dynamic reconfiguration has been briefed in [7]. The major limitation of the dynamic recovery method is the associated delay due to the time required for testing and reconfiguration. But this impact can be reduced easily if the reconfiguration is performed during idle time or on an idle hardware portion of the system, if any [8]. But such arrangement cannot provide a real-time protection of the system as testing and reconfiguration are performed only when the resources become idle. For real time operation, we consider a topology that incorporates the hot-standby feature in dynamic recovery, where testing and reconfiguration are carried out simultaneously without stopping the normal operation of the system [9]. In this topology, spare modules are tested for faults and if found to be non-faulty, operation of some normal active modules is transferred to them making the operative modules act as spares and tested for errors. If any module is identified as faulty, further activation of that module in the circuit is prohibited and hence no extra time is required for reconfiguration. A detailed study on generalized modular redundancy scheme enhancing the fault tolerance for combinational circuits has been carried out in [10]. Most of the sequential circuits can be thought of as an integration of some combinational logic and registers (e.g. up-counter = incrementer + registers). In general scan chain technique [11] is used to locate faulty registers/flip-flops and dismantle them from the system. Registers can also be designed fault tolerant using triple redundant storage [12]. Hence a sequential circuit can be made capable of tolerating faults by combining fault tolerant registers with fault tolerant combinational blocks of its parts. Proper choice of the spare module-size in case of dynamic reconfiguration plays an important role in minimizing the area and delay overheads of the system. Granularity is the minimum module-size in which a system is broken down and the minimum sized replaceable module of the system is called as granule. Proper

http://dx.doi.org/10.1016/j.microrel.2016.04.001 0026-2714/© 2016 Published by Elsevier Ltd.

Please cite this article as: A. Mukherjee, A.S. Dhar, Choice of granularity for reliable circuit design using dynamic reconfiguration, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.04.001

2

A. Mukherjee, A.S. Dhar / Microelectronics Reliability xxx (2016) xxx–xxx

choice of granularity helps in minimizing the overall cost for designing of the fault tolerant circuit. But most of the recent literatures that deal with the design of reliable architectures using dynamic reconfiguration consider some specific granularity [9–10,13–14]: fine or coarse [15], and to the best of our knowledge, analysis on choice of the size of the spare modules has not been previously presented. In this paper, we make an attempt for proper selection of the granule-size that would help the designers to minimize the area and delay overheads of a system for a given reliability depending on the particular requirements. For implementing fault tolerance using dynamic reconfiguration, our first objective is to identify the structural regularity within a circuit and then divide the circuit into symmetrical modules and make the approach systematic. Array of such symmetrical modules is also known as iterative logic array (ILA), where identifying the smaller blocks and defining the proper granule-size for the circuit to make it module-wise fault tolerant not only increase the understandability of the approach, but also make the system easy to debug in future. We compute the specific value of the granule-size for which the total area overhead for fault tolerant design of the given system is minimum, which optimizes the reliability as well the delay overhead. We also formulate an analytical expression for granularity that optimizes the trade-off between the area overhead of a system with the number of faults tolerated maximizing the overall fault coverage. Major contributions of our work in this paper are as follows: • We analyze how the area and delay overheads change with different factors such as the granule-size chosen, number of inputs, number of outputs and the interconnections among the intermediate modules and find out the optimal value of the granule. We show that if singlebit granule is chosen instead of the optimal size of 4-bit granule, design of fault tolerant 64-bit ripple carry adder (RCA) requires 20% higher area overhead and that becomes N50% when the chosen granule-size = 64. • We extend the design approach making it capable of tolerating multiple faults. Keeping in mind about the trade-offs among the granule-size chosen, total area overhead and the number of faults tolerated, we also provide an analytical discussion that helps us to choose the granule-size precisely such that area overhead is minimized for maximum fault coverage. We also prove that for particular selection of granularity, a circuit can tolerate multiple faults instead of just a single fault at the same hardware cost. • We incorporate hot-standby topology that makes the fault tolerant mechanism online, i.e., we do not need any extra time for testing and reconfiguration, and any module identified as faulty is immediately disconnected from the system prohibiting it from further participation in the normal operation. The rest of the paper is organized as follows. Section 2 describes the theoretical background required for the present work. Considering single fault cases, how the area and delay overheads vary with the granule-size is discussed in Section 3. Section 4 describes the methodology of testing and reconfiguration proposed in the present work. In Section 5, the optimization analysis is done showing how proper selection of the module size plays a significant role in any fault tolerant ILA design. Section 6 considers multiple fault tolerance. In Section 7, the concept for choice of granularity is applied on some real-life digital functional units like RCA, conditional sum adder (CSA), comparator, incrementer, multiplier, Viterbi decoder and COordinate Rotation DIgital Computer (CORDIC) for their optimal fault tolerant designs. The paper is concluded in Section 8. 2. Theoretical background A fault tolerant approach enabling the autonomous restoration of the defective module in a system, avoiding fault accumulation and reestablishing the correct circuit state in real-time has been presented in

[16]. Self-repairing procedure for permanent faults and in-the-field self-testing for specific applications has been presented in [17]. But these methods believe on offline testing only limiting their usages for real time applications. A detailed study on self-healing approach and its optimization for asynchronous circuits have been developed in [18]. The authors have also highlighted the efficiencies of their method in terms of resource occupation, fault tolerance, reconfiguration speed and capability to tolerate permanent as well as transient faults. But these works have not discussed anything on the granularity, i.e. at which level the extra circuitry for testing and reconfiguration should be added. Selection of granularity for achieving different trade-offs among cost, performance and recovery time for fault tolerant designs using TMR has been discussed in [19]. In case of dynamic reconfiguration, there are two levels of adding redundancy: One is coarse-grain redundancy (CGR) approach [20] that uses spare rows and columns to an array tolerating clustered defects. But it has limitations in tolerating multiple, distributed random defects. The other one is fine-grain redundancy (FGR) approach [21] that uses spare wires eliminating the need for rerouting and minimizing timing variance due to correction. At high defect levels, it requires lower area overhead than CGR, but at lower defect rate, CGR requires less area overhead than FGR [15]. Combination of CGR and FGR has been efficiently used in many systems to tolerate random distributed defects as well as clustered and bridging faults. In case of CGR, proper choice of granularity plays an important role in achieving higher reliability at lower area and delay overheads [22]. Some works are available in literature that shows the trade-off among reliability, redundancy and performance of the system [15, 23–25] for changing the granularity. But their discussions are completely system specific. Here we have derived a generalized formula to calculate the optimum granularity for any system having structural regularity. The outputs of the currently operative modules are monitored by a fault detection and reconfiguration unit that activates the spare module in place of a working module upon identifying faults in the later. Area overhead increases for the spare module as well as for the two levels of multiplexers (MUXes) those are needed for proper routing of the inputs to and the outputs from the non-faulty active modules bypassing the faulty one. If we vary the size of the granule, the area and delay overheads differ due to the change in the number of extra MUXes required for input and output signal selection depending on the granularity chosen. Hence we need to optimize these overheads with proper choice of the granularity. In most of the practical cases, the modules of a circuit are interconnected and extra MUXes are required at the interconnections among the modules for proper routing of the signals through the non-faulty ones. Here, the number of MUXes used for selection of signals at intermediate connections among the modules decreases with increase in the size of the granule and hence proper selection of granularity is very important to minimize the overall cost in designing a fault tolerant circuit. In this paper, we analyze how the choice of the granule-size influences the area and delay overheads of the fault tolerant circuit and also determine the optimal size of the granule for a given design for which minimum hardware cost is achieved. For ease of understanding, we call the minimum possible sized module of a circuit as a 1-bit granule and hence the club of k minimum-sized module as k-bit granule. Representative example of 1-bit granule with a number of primary inputs, b number of primary outputs and c number of interconnected inputs and outputs fed from and to the neighboring granules for some digital module of an arbitrary system suitable for making fault tolerant using dynamic reconfiguration is shown in Fig. 1. The fault tolerant design is also cascadable in nature so that the number of bits to be handled by it can be increased as required by connecting similar circuit blocks. Our fault tolerant structure can tolerate almost all types of faults like transistor stuck-open, stuck-close faults, input–output stuck-at faults and bridging faults occurring within a single granule. To incorporate

Please cite this article as: A. Mukherjee, A.S. Dhar, Choice of granularity for reliable circuit design using dynamic reconfiguration, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.04.001

A. Mukherjee, A.S. Dhar / Microelectronics Reliability xxx (2016) xxx–xxx

3

AG represents the area of one single-bit granule, i.e. area of the smallest sub-block in the system and AM represents the area of a 2:1 MUX. For a given technology, AM is fixed and AG varies for different circuits. Similarly, the delay overhead due to the inclusion of signal selection MUXes in the fault tolerant design is given by D1 ¼ nt MUX

ð2Þ

where tMUX is the propagation delay of a 2:1 MUX. 3.2. Case II: k-bit granule Fig. 1. Lowest level sub-block with all inputs and output signals.

multiple fault tolerance, the number of spares in a circuit should be increased [26–27], but that makes the reconfiguration circuitry too complex to implement due to complicated routing of inputs and outputs to non-faulty modules. Alternatively, for an n-bit wide circuit, we cascade n/k number of fault tolerant modules each of size k and each capable of handling single fault within the module itself leading to a tolerance of a maximum of n/k faults whose detailed description is provided in the following sections. 3. Dependence of area and delay overheads on granularity

Instead of choosing single-bit granule we consider a granule of size k, and a spare of same size is used in addition to the active ones. The complete fault tolerant configuration is shown in Fig. 3. As the value of k is chosen by the designer, we may consider it to be a factor of n, otherwise in all cases throughout the paper, n/k should be replaced by ⌈n/k⌉. Here total area overhead with one k-bit spare and the signal selection MUXes becomes A2 ¼ kAG þ fakðn=k−1Þ þ bn þ cn=kgAM :

ð3Þ

Similarly, increase in critical path or the delay overhead is given by D2 ¼

n t MUX : k

ð4Þ

3.1. Case I: single-bit granule For circuits having structural regularity, we find out the submodules in the circuit and use same sized spare to apply dynamic reconfiguration technique to achieve fault tolerance [10]. Consider a digital circuit with structural regularity that can be divided into a maximum of n modules. Here we consider single-bit module as granule and use one same-sized single-bit module as spare with the n active granules. Each granule is tested individually for errors without stopping the normal operation of the system incorporating hot-standby topology. For smooth running of the normal operation and proper routing of input and output signals to and from the active granules, we need a set of MUXes at the input, at the output and at the intermediate stages. A generalized version of such a design is represented in Fig. 2, which shows that for an n-bit fault tolerant structure, we need a(n–1) input selection, bn output selection and cn intermediate signal selection MUXes. The total area overhead (i.e. increase in area for incorporating fault tolerance in comparison to non-redundant one) for making the whole functional unit fault tolerant using one spare granule and the signal selection MUXes is A1 ¼ AG þ faðn−1Þ þ bn þ cngAM :

ð1Þ

It may be noted that, for k = 1, Eq. (3) reduces to Eqs. (1) and (4) reduces to Eq. (2). For circuits with no interconnection among the intermediate granules (i.e. for c = 0), increase in critical paths for both the cases are equal to two MUX propagation delays, one for the input signal selection and the other for the output signal selection. Total area overhead for the fault tolerant structure in Case I with c = 0 and the other parameters remaining same is A3 ¼ AG þ faðn−1Þ þ bngAM

ð5Þ

and that for case II is A4 ¼ kAG þ fakðn=k−1Þ þ bngAM :

ð6Þ

3.3. Testing and reconfiguration ILAs have systematic uniformity within their structures and hence well-suited for dynamic reconfiguration [28]. To minimize the overall test pattern generation cost for an ILA, the circuit is designed such that the size of the test vector becomes independent of the circuit size and such design is called C-testable [29]. But all digital circuits do not have

Fig. 2. Generalized schematic diagram of a system with granule-size 1.

Please cite this article as: A. Mukherjee, A.S. Dhar, Choice of granularity for reliable circuit design using dynamic reconfiguration, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.04.001

4

A. Mukherjee, A.S. Dhar / Microelectronics Reliability xxx (2016) xxx–xxx

Fig. 3. Generalized schematic diagram of a system with granule-size k.

an ILA-like structure and for them we must have to find out subblocks with similarities between them before applying dynamic reconfiguration. After defining proper granule-size (discussed in detail in Section 5), certain number of spares are appended at the most significant bit side depending on the number of faults to be tolerated of the fault tolerant system under consideration. To make the process online, we test the spare granules at first, whilst the normal operations are executed by the active ones. If any granule under test is identified as faulty, its functionality is suspended from further involvement in system operation and the currently active granules continue to perform the normal operation. Otherwise, testing operation passes to the next granules making the already tested but non-faulty modules active and the process continues. Hence in this hot-standby process, the spares are as if rotated among the active granules and are tested for errors, if any, while the remaining modules continue working without stopping the normal functionality of the circuit. Here the granules which are active initially are assumed to be non-faulty and this method can take care of permanent defects only. For proper routing of the signals through the active units and the test patterns to the spare modules under test mode, corresponding MUX select signals along with the test selection lines are generated using a dedicated control circuitry. Several automated test pattern generation (ATPG) and built-in-selftest (BIST) [16–18,30–31] approaches are available in literature those are applied to the circuits under test (CUT) to reduce the overall testing cost. For C-testable designs, testing cost is independent to the size of the granule chosen as well as size of the circuit. The outputs from the CUT are compared with the desired ones and if any mismatch in the result is found, a fault is detected and the circuit is reconfigured prohibiting the faulty granule to take part in further activity of the system. Here no extra time is required for reconfiguration as on identification of any

Hence area overhead is independent of the size of the granule (k) for AG = aAM, i.e., when the area of the single-bit granule (AG) is equal to the total area of MUXes at the primary inputs (aAM). Fig. 4 is the graphical representation of Eq. (6), which depicts the trend that how area overhead for a fault tolerant circuit changes with the size of single-bit granule for different values of k where there is no internal connection among the intermediate granules. The figure illustrates that lower values of k

Fig. 4. Area overhead vs. minimum block size for c=0.

Fig. 5. Graphs showing area overhead vs. minimum granule area for c≠0.

faulty granule, the currently non-faulty active modules continue to perform the job. For C-testable designs, single test pattern generation unit suffices to test all the granules exhaustively and single MUX select signal generation unit is time shared among different sub-modules to minimize the design cost. Hence the area overhead due to the control circuitry including testing and reconfiguration does not increase with increase in the bit-size of the circuit and is also within acceptable limit for C-testable designs. Here we assume that the control circuitry, i.e. the test pattern and the MUX select signal generation units along with the signal selection MUXes are not-faulty. We can use individual fault tolerant MUXes, registers and logic gates [32] to increase the system reliability, which we exclude from the scope of this paper. 4. Choice of proper granularity In this section, we derive the optimal value of the granule-size such that the area overhead is minimum. Extra area overhead in case II compared to case I, when there is no interconnection among the granules within the same circuit (i.e.c = 0) is derived by subtracting Eq. (5) from Eq. (6) and is given by E1 ¼ A4 −A3 ¼ ðk−1ÞðAG −aAM Þ:

ð7Þ

Please cite this article as: A. Mukherjee, A.S. Dhar, Choice of granularity for reliable circuit design using dynamic reconfiguration, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.04.001

A. Mukherjee, A.S. Dhar / Microelectronics Reliability xxx (2016) xxx–xxx

5

are preferred for AG N aAM; but for AG b aAM, area overhead decreases for higher values of k. All the graphs with varying k meet at the point AG = aAM where the area overhead becomes independent of k. Now, for c ≠ 0, from Eq. (3), the graphs of area overhead vs. block-size with different values of k are plotted in Fig. 5. From the figure, we observe that as the value of k increases, the plots become steeper and hence area overhead increases rapidly with AG. For increased value of k, number of MUXes for c-interconnection decreases and hence area overhead also decreases for lower values of AG. Hence with increase in the value of AG, lower value for k is preferred. To find out the point where two graphs with k = k1 and k = k2 intersect, we equate the two overhead values at AG and solving get AG ¼ ða þ cn=k1k2ÞAM :

ð8Þ

The equation also shows how the meeting point for two graphs depends on other parameters like number of inputs (a), circuit length (n) and number of interconnecting signals (c). The graphs in Fig. 6 show how area overhead changes with granulesize (k) for fixed value of AG and the graphs are plotted for different values of AG (assuming AG N aAM). Hence for a given circuit and given AG, area overhead of the fault tolerant circuit decreases at first with k and then increases and there is an optimal value of k for which the area overhead is minimum. Actually, with increase in k, number of MUXes required for the interconnected signals decreases and hence the overall area overhead decreases up to a certain value of k. But beyond that optimal value for k, area of the spare module becomes so high that it overcomes the effect of decrease in the area of MUXes and hence area overhead again increases. To get the optimal value of k that provides the minimum area overhead for fault tolerant structure of a circuit, we differentiate Eq. (3) with respect to k and equate it with zero, which gives sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi cnAM k¼ : AG −aAM

ð9Þ

If the calculated value of k is not an integer, nearest integer value for k is taken as the optimum value of granularity. Now in concern to the reliability of the fault tolerant design, as in both the cases discussed above, the system is capable of tolerating only single defect, reliability should remain the same. But as k varies, area of the fault tolerant design changes and hence reliability also changes due to chance of failure of the redundant parts. Hence minimum amount of redundant area is preferred to have the highest reliability and hence the optimal value of k for minimum area given in Eq. (9) should be chosen. In the discussions so far, we have considered that all the MUXes used for reconfiguration are non-faulty. In practice, we can use fault tolerant

Fig. 7. Area overhead vs. minimum block size for different values of k in case of multiple faults.

MUXes designed using quad-transistor logic as described in [6,32] with area equal to 3.5 times that of the normal non-fault tolerant MUXes. All the above equations remain valid in such case with the corresponding change in the value of AM. 5. Multiple fault tolerance An error in some part of a circuit generally propagates to its neighbors and affects some more parts of the circuit causing failure of them [33]. Sometimes multiple failures can also occur, thus increasing the requirement for multiple fault tolerant capability to meet the higher reliability requirements. In this section, we would discuss on how area and delay overheads of the fault tolerant circuit changes with granularity chosen for different values of Nf (Number of faults tolerated). For tolerating multiple faults, one option is to use j number of spare granules (j N 1) that is capable of handling up to j number of faults. But the interconnection for selection of non-faulty blocks through signal selection MUXes and hence the design of the control circuitry becomes too complicated. Path finding algorithms can be used for proper routing from input to output signal through non-faulty modules. But on occasions, for some complex designs, the simple path finding algorithm also fails and algorithms with higher complexity is used, implementation of which increases the overall cost for designing the control circuitry beyond the acceptable limit. Hence, instead of using simple k-bit modules, we replace each of them with a k-bit fault tolerant module that consists of one singlebit sub-block as spare along with the operational ones. Here granulesize is equal to (k + 1) bit and each granule is capable of handling single fault by its own. For each such k-bit fault tolerant granule, area overhead is A5 ¼ AG þ faðk−1Þ þ ðb þ cÞkgAM :

ð10Þ

Considering n-bit circuit, we need n/k number of such fault tolerant granules. Hence the total area overhead for the given circuit is A6 ¼ ðn=kÞ½AG þ faðk−1Þ þ ðb þ cÞkgAM :

ð11Þ

Fig. 7 depicts the variation in area overhead with single-bit granulesize (AG) for different values of k in case of multiple faults where for a given k, number of maximum faults tolerated is n/k. Increase in critical path for the above n-bit fault tolerant circuit is similarly calculated as D3 ¼ ðn=kÞðkt MUX Þ ¼ nt MUX Fig. 6. Area overhead vs. granule-size k for c≠0.

ð12Þ

which is independent of k.

Please cite this article as: A. Mukherjee, A.S. Dhar, Choice of granularity for reliable circuit design using dynamic reconfiguration, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.04.001

6

A. Mukherjee, A.S. Dhar / Microelectronics Reliability xxx (2016) xxx–xxx

Fig. 8. Change in area overhead with number of faults tolerated in case of multiple fault tolerance.

Extra area overhead in case of multiple fault tolerance compared to single fault tolerance is E2 ¼ A6 −A2 ¼ ðn=k−kÞðAG −aAM Þ þ

k−1 cnAM : k

ð13Þ

that for every design, there is an optimal value of granularity for which the area overhead is minimum. The simulation results support the theoretically derived values for attaining the optimum values of the granule-sizes. 6.1. Fault tolerant RCA design

Hence for modules having no interconnections among themselves (i.e., c = 0), area overhead for multiple fault tolerance becomes identical pffiffiffi to that for the single fault case while k ¼ n; i.e., the circuit can tolerate pffiffiffi at maximum b nc number of faults instead of single fault at same hardware cost. Fig. 8 shows extra area overhead in case of multiple fault tolerance with respect to single fault case for n = 512 with AG N aAM (●), AG = aAM (▲) and AG b aAM (×) for different values of k with a = 4 & c = 2. The graph signifies that for AG N aAM, as value of k (size of the granule) increases, number of faults tolerated (Nf) as well as area overhead decreases rapidly. This is because, larger the granule-size for multiple fault tolerance, lesser number of modules are used and hence number of faults tolerated also decreases. Similarly for AG b aAM, Nf decreases as k increases, but area overhead increases. Thus optimal value for k should be chosen depending on the requirements of the specific applications in mind. 6. Case studies In this section, we consider some illustrations of fault tolerant circuit design to exemplify our claim of having an optimal granularity to minimize the area overhead. Firstly we design a fault tolerant RCA, which has an inbuilt ILA structure. Though RCA is very simple in design and many faster adders are available in literature to overcome the problem with carry propagation delay with it, we take RCA as our first representative example of fault tolerant circuit design for its popularity and easy of understandability. Next we study fault tolerant designs of circuits with different complexities including CSA, comparator, incrementer/decrementer, multiplier, Viterbi decoder and CORDIC. We have simulated the fault tolerant circuits in Synopsys Design Vision using UMC 180 nm standard cell library and noted

An n-bit RCA is made off n single-bit full adders (FA) each having two primary inputs (a = 2), one cascading input (c = 1) and one primary output (b = 1). Here we construct a 64 bit (n = 64) fault tolerant RCA and find out the optimal size of the granule (k) for which the area overhead is minimum. Considering single fault tolerant case, the area overhead for fault tolerant RCA is A2;RCA

64

¼ kAG þ f2kð64=k−1Þ þ 64 þ 64=kgAM

ð14Þ

¼ kAG þ ð192−2k þ 64=kÞAM : Increase in critical path is calculated using Eq. (4) as D2;RCA

64

¼

64  t MUX : k

ð15Þ

Fault tolerant RCAs with different granule-sizes are implemented using UMC 180 nm standard cell library to observe how the area Table 1 Area and delay overheads for 64-bit fault tolerant RCA for different values of k. Value of k

Area overhead (μm2)

Area overhead (%)

Delay overhead (ns)

Delay overhead (%)

Reliability (pFA = 0.001)

1 2 4 8 16 32 64

4946.72 4353.63 4118.14 4129.98 4371.76 5002.45 6291.61

95.37 83.94 79.4 79.63 84.29 96.45 121.3

8.32 4.18 2.10 1.04 0.51 0.25 0.13

46.22 23.22 11.67 5.78 2.83 1.39 0.72

0.9869 0.9875 0.9879 0.9875 0.987 0.9862 0.9859

Please cite this article as: A. Mukherjee, A.S. Dhar, Choice of granularity for reliable circuit design using dynamic reconfiguration, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.04.001

A. Mukherjee, A.S. Dhar / Microelectronics Reliability xxx (2016) xxx–xxx

7

Fig. 9. Area and delay overheads vs. size of the granule (k) for 64-bit Fault Tolerant RCA design.

overhead changes with varying k and the corresponding values are tabulated in Table 1 and also plotted in Fig. 9. From the figure, it is clear that the delay overhead decreases with k and k = 64 is the best possible case to achieve maximum speed of the fault tolerant circuit. But area overhead decreases at first with an increase in the granule-size, at k = 4 it becomes minimum and beyond that limit it increases again. Reliability values for the 64-bit RCA with probability of failure for single full adder as 0.001 for varying k are also tabulated in the table, which shows that reliability is also maximum for k = 4. From Eq. (9), with FA size AG = 81.2525 μm2 and MUX size AM = 18.87 μm2, it is derived that k = 5.268 is the optimal value for 64-bit fault tolerant RCA and hence if we use 5-bit blocks, it would be the best option. But 64 is not divisible by 5 and as we opt for granule-size = 2l, got k=4=22 as the best option. Hence theoretical and simulation results support each other. Next we compare the area overhead for fault tolerant RCA using dynamic reconfiguration method with the popular and well accepted triple modular redundancy technique (TMR) [34] that is essentially a static redundancy method. Table 2 shows the comparison between the dynamic reconfiguration method using granule-size equal to four and TMR in terms of area and delay overheads as well as reliability for 64-bit fault tolerant RCA. Here we have included the area and critical path of the control circuitry for the design of fault tolerant RCA using dynamic reconfiguration. From Table 2, we notice that TMR requires an area overhead of 231.6%, whereas that for dynamic reconfiguration method is only 113.8% even after considering the area of the control circuitry proving it to be more acceptable fault tolerant approach over TMR where hardware constraint is a major problem. TMR also has its own advantages like lower delay overhead making its use in high-speed applications and capability of tolerating permanent as well as transient defects, whereas dynamic reconfiguration method can tolerate permanent faults only. But our dynamic reconfiguration provides higher reliability Table 2 Comparison of area between TMR and dynamic recovery for 64-bit RCA**. Fault tolerant approach Non-redundant TMR Dynamic reconfiguration (k = 4)

Actual area (μm2)

Area overhead (in %)

Critical path (ns)

Delay overhead (in %)

Reliability (pFA = 0.001)

5186.45 17199.28 11089.31

– 231.6 113.8

18.04 18.99 20.2

– 5.3 11.9

0.758 0.915 0.987

⁎⁎ All simulations done in 180 nm technology.

than TMR mainly because of higher probability of failure of the larger redundant part in case of TMR as depicted by the above table. All the above designs for RCA can tolerate single fault only. Considering the cases of multiple faults, instead of using simple k-bit sub-block, each of them is replaced with k-bit fault tolerant blocks. Area overhead in such case for a 64-bit RCA is A6;RCA

64

¼ ð64=kÞ½AG þ f2ðk−1Þ þ k þ ðk þ 1ÞgAM  ¼ ð64=kÞ½AG þ ð4k−1ÞAM 

ð16Þ

and number of faults tolerated is N f ¼ bnkc: Values of area overheads and number of faults (Nf) tolerated with k for 64-bit RCA are tabulated in Table 3. Fig. 10 shows how the area overhead and the number of faults tolerated (Nf) varies with the value of k. Results in the table and the figure show that as the value of k increases, the area overhead decreases and the total number of maximum faults that can be tolerated by the system also decreases. Hence to obtain higher reliability we must select lower values of k, but that results in huge increase in area overhead. For a given design specification, optimal value of k must be chosen for which area overhead is minimum depending on the reliability requirement. 6.2. Fault tolerant CSA design An n-bit CSA is made of n/2 number of CSCs (conditional selection cells) [9], each having four primary inputs (a = 4) and six primary outputs (b = 6). Hence area overhead for the 64-bit fault tolerant CSC array is given by (from Eq. (6)) ΔA4;CSA

64

¼ kAG þ f4kð32=k−1Þ þ 6  32gAM ¼ kAG þ ð320−4kÞAM :

ð17Þ

Table 3 Variation in area overhead and number of faults (Nf) tolerated with k for 64-bit RCA. Value of k

Area overhead (μm2)

Nf

1 2 4 8 16 32 64

8832.12 6847.98 5856.43 5360.32 5112.76 4988.16 4925.88

64 32 16 8 4 2 1

Please cite this article as: A. Mukherjee, A.S. Dhar, Choice of granularity for reliable circuit design using dynamic reconfiguration, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.04.001

8

A. Mukherjee, A.S. Dhar / Microelectronics Reliability xxx (2016) xxx–xxx

Fig. 10. Area overhead and number of faults tolerated vs. value of k for 64-bit FT RCA design.

Now area of one CSC block in UMC 180 nm technology is given by AG = 174.99 μm2 and the MUX size is AM = 18.87 μm2. Putting the values in Eq. (17), we get, ΔA4;CSA

64

¼ 174:99k þ ð6038:4−75:48kÞ ¼ 6038:4 þ 99:51k:

ð18Þ

Hence area overhead linearly increases with k and k = 1 is the best case for fault tolerant CSA design. Similarly for other circuits with c = 0 and AG N aAM, granule-size of one provides the best optimal solution (refer to Fig. 4). Critical path delay does not change with k for CSC array and other similar circuits with c = 0 and the change in the delay for these cases is equal to 2tMUX. 6.3. Fault tolerant comparator design Digital comparators are very useful circuits in comparing two binary numbers to determine whether one number is greater than, less than or equal to the other number and is widely used in microcontrollers, central processing units, testing circuitry and many other digital architectures. Hence it is very important to design fault tolerant comparators to increase the reliabilities of them. Basic block diagram of 4-bit comparator logic with primary inputs as xi and yi's and interconnection inputs and outputs as x=y,xN y and xb y are shown in Fig. 11. Fig. 12 shows the internal architecture for a 4-bit comparator with all its inputs and outputs. Comparator structure is itself cascadable and suitable patterns should be fed to the cascadable inputs where no previous stage is present. We call each identical block in the figure as ‘comp’ (marked by dotted line in Fig. 12), which actually gives 1-bit

comparison output. Hence, comp is the minimum possible granulesize for a comparator design. From the figure, we infer that for n-bit magnitude comparator, we use n such comp blocks and they should be properly cascaded. Our aim is to find out the optimum granule-size so that total area overhead is minimized for the fault tolerant comparator design using dynamic reconfiguration method. For each comp block, number of primary inputs a = 2, feed forward signals c = 3, primary outputs b = 0, comp (single-bit granule) size AG = 101.16 μm2 and the MUX size AM = 18.87 μm2 (in 180 nm technology). As there are no primary outputs, no output selection MUX is required and hence total area overhead is comparatively lower in case of the fault tolerant comparator. Theoretically from Eq. (9), calculated value of k for minimum area overhead is equal to 7.6 for a 64-bit comparator (n = 64). Now we implement a 64-bit fault tolerant comparator using UMC 180 nm standard cell library to observe how the area overhead changes with different values of k (= 2l) and plot the results in Fig. 13. The graph also shows that area overhead is minimum for k = 8, which completely supports our theoretical explanations. The graph also shows that delay overhead decreases with increase in the value of k.

6.4. Fault tolerant incrementer (decrementer) design Incrementer (decrementer) is a digital circuit that increments (decrements) the input value by one and while used in conjunction with registers, it forms an up-counter (down-counter). To make the incrementer (decrementer) block fault tolerant, using dynamic reconfiguration, we need to identify the symmetrical sub-blocks within its structure. Fig. 14 shows the logic diagram of a 4-bit incrementer with

Fig. 11. Block diagram of a 4-bit magnitude comparator.

Please cite this article as: A. Mukherjee, A.S. Dhar, Choice of granularity for reliable circuit design using dynamic reconfiguration, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.04.001

A. Mukherjee, A.S. Dhar / Microelectronics Reliability xxx (2016) xxx–xxx

9

Fig. 12. Internal architecture of a 4-bit magnitude comparator.

Fig. 13. Area and delay overheads vs. granule-size k for 64-bit fault tolerant comparator.

modified msb and lsb that makes it block-wise cascadable and modular. Next we add one spare sub-block and apply similar design approach discussed so far to design reliable self-reconfigurable n-bit incrementer. Next we have to find out the optimum granule-size so that total area overhead is minimized for the fault tolerant incrementer design using dynamic reconfiguration method. Each identical sub-block in the design consists of one AND and one Ex-

OR gate with number of primary inputs a = 1, number of feed forward signals c = 1, number of primary outputs b = 1. Single-bit granule-size AG is 40.5 μm 2 and the MUX size AM is 18.87 μm2 (in 180 nm technology). From Eq. (9), calculated value of k for minimum area overhead is equal to 7.4 for a 64-bit incrementer (n = 64). Simulated best case for 64-bit incrementer is k = 8 that satisfies the theoretical derivation.

Fig. 14. Logic diagram of 4-bit incrementer with one spare sub-block.

Please cite this article as: A. Mukherjee, A.S. Dhar, Choice of granularity for reliable circuit design using dynamic reconfiguration, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.04.001

10

A. Mukherjee, A.S. Dhar / Microelectronics Reliability xxx (2016) xxx–xxx

Fig. 15. Unsigned array multiplier.

We can also design fault tolerant decrementer circuit with optimum granule-size in similar fashion.

6.5. Fault tolerant multiplier design The simplest parallel unsigned multiplier is the Braun's Multiplier [35], whose structure is shown in Fig. 15. In this type of multiplier, all the partial products are computed in parallel and fed to the next FA to calculate the product. To make the design regular, we have replaced all half adders (HA) by FAs with ‘0’ as the extra input. Now from the figure, we observe that each of the diagonal FA column (as shown by dotted line) can be thought of a RCA structure. To make the multiplier fault tolerant, we replace all the RCA structures with their fault tolerant counterparts. Here to multiply two 4-bit numbers, we need three such 4-bit RCA arrays, each consisting of 4 FAs. To make each RCA array fault tolerant with FA size AG = 81.2525 μm2, MUX size AM = 18.87 μm2, number of primary inputs a = 2, number of primary outputs b = 1, number of carry signals c = 1 and size of the RCA n = 4, we append k spare FAs with four active FAs. From Eq. (9), the derived value for optimum granularity to achieve minimum area overhead is k = 1, which is also supported by the simulation results.

6.6. Fault tolerant Viterbi decoder design A Viterbi decoder uses Viterbi algorithm to decode a bit stream of data encoded using convolutional encoder. A hard decision Viterbi decoder [36] receives a simple bit stream on its input and uses Hamming distance as a metric. Convolutional encoding and Viterbi decoding are widely used in satellite communication and other noisy communication channels. Viterbi decoder block consists of two main modules: add compare select (ACS) block and permutation network path history (PNPH) unit. Fig. 16 shows one ACS unit for the i-th state register with 8-bit data processing. For n number of states, we have total n such ACS units. Now to make the complete ACS structure fault tolerant, we club k such ACS units together and use one spare of size k. Here block-size is same as the area of one ACS unit and is equal to 1639.56 μm2 (in 180 nm technology) with number of primary inputs a = 4, number of primary outputs b = 1 and number of interconnecting input c = 16. Calculated value (using Eq. 9) of optimum k for minimum area overhead for the ACS array with 6 memory elements (z) and hence 64 states (n = 26 = 64) is equal to 4. The PNPH unit [37] for an (x, y, z) convolutional code is a 5L-stage permutation network with each stage containing 2z numbers of 1-to-

Fig. 16. ACS block for state i.

Please cite this article as: A. Mukherjee, A.S. Dhar, Choice of granularity for reliable circuit design using dynamic reconfiguration, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.04.001

A. Mukherjee, A.S. Dhar / Microelectronics Reliability xxx (2016) xxx–xxx

11

Fig. 17. PNPH unit of the Viterbi decoder.

2y Demultiplexers (DEMUX), where L is the constraint length and is equal to seven (L = z ± 1) in our case. The decoding block for an (1, 2, 6) convolution code is shown in Fig. 17, which has 64 rows of DEMUXes, each consisting of 35-stage permutation network. To make the PNPH unit fault tolerant, we add spare state nodes in each row as well as some fault tolerant spare rows to tolerate multiple errors in the unit. One row of state nodes is shown by dotted line in the figure and we consider granule-size of k1 to achieve minimum area overhead to make each of the rows fault tolerant. In this case, area of one state node is AG = 51.84 μm2 with a = 0, b = 0, c = 3 and number of state nodes n1 = 35. Hence, from Eq. (9), optimum value for k1 is 6, i.e., for each row, we consider a set of 6 consecutive state nodes (add one extra state node to make n1 = 36) and use one spare consisting of 6 state nodes (since k1 = 6). In this way, each row is now capable of tolerating single fault with minimum increase in area overhead. Next, we consider the fault tolerant row as single block and add a spare row of size k2. In this case, a = 1, b = 2, c = 80, n2 = 64 and AG = 3919.1 μm2 (in UMC 180 nm technology). In this case, the derived value for optimum granularity is k2 = 5, i.e., we should use a set of five fault tolerant rows of state nodes as spare (add one extra active row to make n2 = 65). In this way, we make the PNPH block tolerable to multiple failures. Next we combine the defect tolerant ACS block with the defect tolerant PNPH unit to make the complete Viterbi decoder failure-safe. We have chosen optimal values for granularities of the ACS and PNPH units separately so that we can maximize the reliability for the whole Viterbi decoder at minimum increase in area.

6.7. Fault tolerant CORDIC design Fig. 18 shows the basic pipelined 4-bit CORDIC [38] structure. We have identified the identical sub-blocks in the structure and one of them is shown by dotted line. If n = 64, we have to use 64 such subblocks each with a = 1, b = 0 and c = 3. Size of single sub-block is AG = 19933.25 μm2 (in UMC 180 nm technology). From Eq. (9), optimum value of granularity is k = 1. Here as AG N N AM, minimum possible value of k is preferred for the least area overhead (refer to Fig. 5) and hence in designing fault tolerant CORDIC, only one spare sub-block is used along with the 64 active sub-blocks. Fault tolerant CORDIC can be used in designing various defect tolerant digital architectures including the architecture for fast Fourier transform (FFT) and multiplication using simple shift and add operation as well as realizing different error-free trigonometric functions. 7. Conclusions In this paper, we have shown that how area and delay overheads for a fault tolerant circuit with dynamic reconfiguration change with granularity. For optimization of the hardware cost, one has to select the appropriate fault tolerant approach. Number of inputs, number of outputs, and sub-block size and proper selection of granularity play important roles in deciding the trade-off between the area overhead and the maximum number of faults to be tolerated while designing fault tolerant architectures using dynamic reconfiguration. Depending on the topology used, area overhead also changes with the number of faults to be

Please cite this article as: A. Mukherjee, A.S. Dhar, Choice of granularity for reliable circuit design using dynamic reconfiguration, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.04.001

12

A. Mukherjee, A.S. Dhar / Microelectronics Reliability xxx (2016) xxx–xxx

Fig. 18. Pipelined CORDIC structure.

tolerated. To make the concept more comprehensive, we have also applied it to some specific examples to find out the best granularity options for their fault tolerant designs. References [1] J. Han, J. Gao, P. Jonker, Y. Qi, J.A.B. Fortes, Toward hardware-redundant, fault-tolerant logic for nanoelectronics, IEEE Des. Test Comput. 22 (4) (Apr 2005) 328–339. [2] J. Chen, G. Venkataramani, H.H. Huang, Exploring dynamic redundancy to resuscitate faulty PCM blocks, ACM J. Emerg. Technol. Comput. Syst. 10 (4) (May 2014) 1–23. [3] Q.Z. Zhou, X. Xie, J.C. Nan, Y.L. Xie, S.Y. Jiang, Fault tolerant reconfigurable system with dual-module redundancy and dynamic reconfiguration, J. Electrochem. Sci. Technol. 9 (2) (Jun 2011) 167–173. [4] J. von Neumann, Probabilistic logics and the synthesis of reliable organisms from unreliable components, in: C.E. Shannon, J. McCarthy (Eds.), Automata Studies: Annals of Mathematics Studies, 34, Princeton University Press 1956, pp. 43–98. [5] J.G. Tryon, Quadded Logic, in: R.H. Wilcox, W.C. Mann (Eds.), Redundancy Techniques for Computing Systems, Spartan Books, Wash. D.C. 1962, pp. 205–208. [6] A.H. El-Maleh, B.M. Al-Hashimi, A. Melouki, F. Khan, Defect-tolerant N2-transistor structure for reliable nanoelectronic designs, IET Comput. Digit. Tech. 3 (6) (Nov. 2009) 570–580.

[7] T. Koal, H.T. Vierhaus, Basic Architecture for Logic Self Repair, IEEE International Online Testing Symposium, (IOLTS) Rhodes 2008, pp. 177–178. [8] N.M. Huu, B. Robisson, M. Agoyan, N. Drach, Low-cost fault tolerance on the ALU in simple pipelined processors, Proc. 13th IEEE Symposium on Design and Diagnostics of Electronic Circuits and Systems 2010, pp. 28–31. [9] A. Mukherjee, A.S. Dhar, Real-time fault-tolerance with hot-standby topology for conditional sum adder, Microelectron. Reliab. 55 (3–4) (Feb.-Mar. 2015) 704–712. [10] A.H. El-Maleh, F.C. Oughali, A generalized modular redundancy scheme for enhancing fault tolerance of combinational circuits, Microelectron. Reliab. 54 (1) (Jan. 2014) 316–326. [11] N.G. Jacobson and D.R. Curd, “Boundary-scan register cell with bypass circuit,” U. S. Patent 6,314,539, issued November 6, 2001. [12] G.W. McIver, J.R. Marum, and J.B. Cho, “Triple redundant fault-tolerant register,” U. S. Patent 5,031,180, issued July 9, 1991. [13] J.J. Davis, P.Y.K. Cheung, Reducing overheads for fault-tolerant datapaths with dynamic partial reconfiguration, Proc. 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) 2014, p. 103. [14] S. Ghosh, K. Roy, Novel low overhead post-silicon self-correction technique for parallel prefix adders using selective redundancy and adaptive clocking, IEEE Trans. Very Large Scale Integr. VLSI Syst. 19 (8) (Aug 2011) 1504–1507. [15] A.J. Yu, G.G. Lemieux, FPGA defect tolerance: impact of granularity, IEEE International Conference on Field-Programmable Technology, NUS, Singapore Dec. 2005, pp. 189–196.

Please cite this article as: A. Mukherjee, A.S. Dhar, Choice of granularity for reliable circuit design using dynamic reconfiguration, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.04.001

A. Mukherjee, A.S. Dhar / Microelectronics Reliability xxx (2016) xxx–xxx [16] M.G. Gericota, L.F. Lemos, G.R. Alves, J.M. Ferreira, On-line self-healing of circuits implemented on reconfigurable FPGAs, IEEE International On-line Testing Symposium (IOLTS), Crete 2007, pp. 217–222. [17] M. Ulbricht, M. Schölzel, T. Koal, H.T. Vierhaus, A new hierarchical built-in self-test with on-chip diagnosis for VLIW processors, IEEE International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS), Cottbus 2011, pp. 143–146. [18] T. Panhofer, W. Friesenbichler, M. Delvai, Optimization concepts for self-healing asynchronous circuits, IEEE International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS), Liberec 2009, pp. 62–67. [19] D. Bhaduri, S.K. Shukla, Nanoprism: a tool for evaluating granularity vs. reliability trade-offs in nano architectures, Proc. 14th ACM Great Lakes Symposium on VLSI 2004, pp. 109–112. [20] F. Hatori, T. Sakurai, K. Nogami, K. Sawada, M. Takahashi, M. Ichida, M. Uchida, I. Yoshii, Y. Kawahara, T. Hibi, Y. Saeki, H. Muroga, A. Tanaka, K. Kanzaki, Introducing redundancy in field programmable gate arrays, IEEE Custom Integrated Circuits Conference May 1993, pp. 7.1.1–7.1.4. [21] A.J. Yu, G.G.F. Lemieux, Defect-tolerant FPGA switch block and connection block with fine-grain redundancy for yield enhancement, IEEE International Conference on Field Programmable Logic and Applications, Tampere, Finland Aug. 2005, pp. 255–262. [22] A. Malek, S. Tzilis, D.A. Khan, I. Sourdis, G. Smaragdos, C. Strydis, A probabilistic analysis of resilient reconfigurable designs, Proc. IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT) Oct. 2014, pp. 141–146. [23] A.P. Shanthi, R. Parthasarathi, Exploring FPGA structures for evolving fault tolerant hardware, Proc. IEEE. NASA/DoD Conference on Evolvable Hardware Jul. 2003, pp. 174–181. [24] M. Lin, S. Ferguson, Y. Ma, T. Greene, HAFT: a hybrid FPGA with amorphous and fault-tolerant architecture, Proc. IEEE Int. Symp. Circuits Syst. (May 2008) 1348–1351. [25] S. Di Carlo, M. Indaco, P. Prinetto, E.I. Vatajelu, R. Rodriguez-Montanes, J. Figueras, Reliability estimation at block-level granularity of spin-transfer-torque MRAMs, Proc. IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT) Oct. 2014, pp. 75–80.

13

[26] S.Y. Huang, On improving the accuracy of multiple defect diagnosis, Proc. 19th IEEE VLSI Test Symposium (VTS) 2001, pp. 34–39. [27] W. Cong, Y.L. Liu, P.L. Jiang, Q.Z. Zhang, F. Tao, L. Zhang, Multiple faults detection with SoC dynamic reconfiguration system based on FPGA, Adv. Mater. Res. 694 (May 2013) 2642–2645. [28] G. Jiang, J. Wu, J. Sun, Y. Gao, Flexible rerouting schemes for reconfiguration of multiprocessor arrays, J. Parallel Distrib. Comput. 74 (10) (Oct 2014) 3026–3036. [29] S.K. Lu, J.C. Wang, C.W. Wu, C-testable design techniques for iterative logic arrays, IEEE Trans. VLSI Syst. 3 (1) (1995) 146–152. [30] A. Kumar, J. Rajski, S.M. Reddy, T. Rinderknecht, On the generation of compact deterministic test sets for BIST ready designs, Proc. 22nd Asian Test Symposium (ATS) 2013, pp. 201–206. [31] R.T. John, K.D. Sreekanth, S. Sivanantham, Adaptive low power RTPG for BIST based test applications, Proc. IEEE International Conference on Information Communication and Embedded Systems (ICICES) Feb. 2013, pp. 933–936. [32] L. Anghel, M. Nicolaidis, Defects tolerant logic gates for unreliable future nanotechnologies, Proc. Int. work-Conf. Artif. Neural Networks 2007, pp. 422–429. [33] A. Cook, H.J. Wunderlich, Diagnosis of multiple faults with highly compacted test responses, Proc. 19th IEEE European Test Symposium (ETS) 2014, pp. 1–6. [34] M. Caffrey, P. Graham, J. Krone, K. Lundgreen, K.S. Morgan, B. Pratt, H. Quinn, 11 assuring robust triple modular redundancy protected circuits in SRAM-based FPGAs, in: K. Iniewski (Ed.), Radiation Effects in Semiconductors, CRC Press 2010, pp. 273–302. [35] K. Hwang, Computer Arithmetic: Principles, Architecture and Design, J. Wiley & Sons Inc., 1979 [36] M. Boo, F. Arguello, J.D. Bruguera, R. Doallo, E.L. Zapata, High-performance VLSI architecture for the Viterbi algorithm, IEEE Trans. Commun. 45 (2) (Feb 1997) 168–176. [37] M.-B. Lin, New path history management circuits for Viterbi decoders, IEEE Trans. Commun. 48 (10) (Oct 2000) 1605–1608. [38] A. Mandal, K.C. Tyagi, B.K. Kaushik, VLSI architecture design and implementation for application specific CORDIC processor, International Conference on Advances in Recent Technologies in Communication and Computing (ARTCom), Kottayam 2010, pp. 191–193.

Please cite this article as: A. Mukherjee, A.S. Dhar, Choice of granularity for reliable circuit design using dynamic reconfiguration, Microelectronics Reliability (2016), http://dx.doi.org/10.1016/j.microrel.2016.04.001