Microelectronics Journal 43 (2012) 160–163
Contents lists available at SciVerse ScienceDirect
Microelectronics Journal journal homepage: www.elsevier.com/locate/mejo
Truncation error analysis of MTBF computation for multi-latch synchronizers$ Terrence Mak School of Electrical, Electronic and Computer Engineering, Newcastle University, Newcastle upon Tyne, UK
a r t i c l e i n f o
a b s t r a c t
Article history: Received 15 May 2011 Received in revised form 20 September 2011 Accepted 22 September 2011 Available online 16 December 2011
Chip designs have an increasing number of independent clock domains. Synchronizer circuits are used to facilitate reliable data transfers between these clock domains. The task of these synchronizers is inherently prone to the occasional, statistically random, failure. These failures are frequently quantified by the synchronizers’ mean time between failures, MTBF. The MTBF becomes worse at an exponential rate with increasing frequency. In contrast, the MTBF improves exponentially as more latches are cascaded to form the synchronizer, but at the cost of increasing the data transfer latency. Thus, selecting the number of latch stages to employ in the synchronizer is a trade-off between reliability and latency. We present equations for accurate estimation of the MTBF of multi-latch synchronizers, combined with an error analysis of these equations. We compare MTBF estimates obtained by using these equations to estimates gathered from comprehensive simulation analysis, and show that error terms are not insignificant. We provide a detailed description of all the assumptions that we have made in both the formulation of the MTBF equations and the circuit simulation environment. & 2011 Elsevier Ltd. All rights reserved.
Keywords: Metastability Mean time between failure (MTBF) Latch/flip-flop Taylor series Truncation error
1. Introduction The continuous scaling of process technology presents a challenge on low skew clock distribution. As a result, the number of independent clock domains on a single chip increases and leads to a growing number of synchronizers for interfacing communication signals. The purpose of a synchronizer is to capture incoming data from another clock domain which could be vulnerable to the metastability problem. While it is well known that no synchronizer can completely avoid metastability [5], it is vital to characterize the probabilistic performance of a synchronizer design in terms of mean time between failure (MTBF). There are a number of ways to characterize the MTBF for a synchronizer, such as measuring the number of failure events from a synchronizer can yield an accurate characterization [3,8,2]. While simulation of synchronizer circuits using numerical simulation provides an effective evaluation of such circuits [9] and, usually, can generate new insight and discoveries [6]. Cascading latches can significantly improve the MTBF for synchronizers. However, it is not trivial to derive a mathematical expression for computing the overall MTBF based on characteristic parameters of individual latches. Sometimes, different versions of expressions can be found and the derivation were either
$ This work was carried out when the author was an intern at the Sun Microsystems Laboratories. E-mail address:
[email protected]
0026-2692/$ - see front matter & 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.mejo.2011.09.011
omitted or ambiguous, such as in [1,4]. In this paper, we present a method to approximate the MTBF for cascaded latches. The truncation and propagation error bounds of the expression are rigorously derived and analyzed. Although alternative derivation methodologies were presented in [4,7], to our knowledge this is the first work to analyze and to derive the MTBF approximation error. We found that the error contributes significantly to the overall evaluation and, as a result, the evaluation underestimates MTBF. This paper is organized as follows: Derivations of MTBF for single and multiple latches are presented in Section 2. The computations for truncation and propagation errors are presented in Section 3 and Section 4 is the concluding remarks.
2. Basic MTBF computations 2.1. MTBF for a single latch An estimation of the MTBF can be made from the settling time Ts, the timing window constant Tw and the settling time constant t. Consider a simple latch, if the data input goes HIGH sufficiently far in advance of the clock edge, the synchronizer output will be driven HIGH and if it is significantly after the clock it will be driven LOW. If the two edges are close enough, and without noise the output may go into the metastable state. Fig. 1 presents the two data arrival waveforms and the one arrived earlier drives the output (Q) settled HIGH and the one arrived later drives the output (Q) settled LOW. There is a balance point, where the
T. Mak / Microelectronics Journal 43 (2012) 160–163
161
Fig. 1. An illustration of waveform of data arrival, clock and the consequences of the Q output.
Table 1 The notations used in MTBF error analysis. Tw Tw i Tc Td Ts Tsi Dti ðT sj Þ
ti
Time window constant Time window constant for the ith latch Clock period Average data period Settling time Settling time of the ith latch The time window of the ith latch that corresponds to the settling time Tsj of the jth latch The settling time constant for the ith latch
separation between data and clock would give an exactly equal probability of a HIGH or LOW outcome. Thus, for the two input arrival times separated by a time Dt in , and spanning this balance point, where Dt in o T w , the settling time for both the HIGH and LOW outputs are the same with Ts (see Fig. 1, Table 1). The relationships between Dt and Ts can be modeled as [4,7]
latches can be obtained by generalizing the two-latch equation. Suppose we have two latches connected in series as shown in Fig. 2. We denote T s2 to be maximum allowable settling time of the second latch and if data arrived within a timing window Dt2 ðT s2 Þ, the settling time of the second latch will be equal to or greater than T s2 . From Eq. (2), we have s
T 2 =t2 Dt2 ðT s2 Þ ¼ T w 2e
ð6Þ
From Fig. 2, we can see that if the output of the first latch resolves within the timing window ½T s1 ,T s1 þ Dt 2 ðT s2 Þ, the settling time of the second latch will be equal to or greater than T s2 . Therefore, we need to compute the time window of the first latch corresponding to a settling time window ½T s1 ,T s1 þ Dt 2 ðT s2 Þ. We have Dt 1 ðT s1 Þ and Dt 1 ðT s1 þ Dt 2 ðT s2 ÞÞ yielding settling times T s1 and T s1 þ Dt 2 ðT s2 Þ, respectively. So we can rewrite the timing window Dt1 ðT s2 Þ as
Dt1 ðT s2 Þ ¼ Dt 1 ðT s1 ÞDt1 ðT s1 þ Dt 2 ðT s2 ÞÞ
ð7Þ
1
T s ¼ t ln
Tw Dtin
ð1Þ
Since the arrival time window Dt in is a function of Ts, and thus we write Dt in as DtðT s Þ and Eq. (1) can be expressed as
Dtin ðT s Þ ¼ T w eT s =t
ð2Þ
Suppose the maximum allowed settling time of a latch equals to Ts. We, therefore, have the metastable window DtðT s Þ that for any data arrived within this window, the settling time will be equal or longer than Ts and, thus, the latch will fail. As a result, we can compute the probability of synchronizer failure as the probability of an incoming signal arrived within this window. Assume that the incoming data is uniformly distributed among a clock period Tc, we can express the probability of a synchronizer failure as follows: PðFailÞ ¼
Dtin ðT s Þ Tc
T s =t
¼
Twe Tc
ð3Þ
Further, the average incoming data period is denoted by Td, which yields the probability for a synchronizer failure per second of T w eT s =t PðFailÞ=second ¼ TcTd
ð4Þ
Dt1 ðT s2 Þ ¼
ð8Þ
2 dDt1 ðT s1 Þ 1 d Dt 1 ðT s1 Þ Dt2 ðT s2 Þ ðDt 2 ðT s2 ÞÞ2 s 2! d2 T s dT 1 1
ð5Þ
2.2. MTBF for cascaded latches In this section, we will derive a general expression for MTBF estimation for n cascaded latches. Firstly, we derive the MTBF for two latches connected in series. The MTBF expression for n
ð9Þ
From Eq. (2), we have s
T 1 =t1 dDt1 ðT s1 Þ T w Dt 1 ðT s1 Þ 1e ¼ ¼ s t1 t1 dT 1
ð10Þ
and similarly for the higher derivative terms. Therefore, we can substitute Eq. (10) into Eq. (9) to obtain a simplified expression: ! Dt1 ðT s1 Þ 1 Dt 1 ðT s1 Þ Dt1 ðT s2 Þ ¼ Dt 2 ðT s2 Þ ð11Þ ðDt 2 ðT s2 ÞÞ2 þ 2! t1 t21 And when we truncate Eq. (11) to the first term: Dt1 ðT s1 Þ Dt1 ðT s2 Þ Dt 2 ðT s2 Þ
ð12Þ
t
Therefore, yields [7] T c T d eT s =t MTBF ¼ Tw
Using Taylor series expansion for the term by letting f ðxÞ ¼ Dt 1 ðxÞ and a ¼ T s1 , we have dDt1 ðT s1 Þ Dt1 ðT s2 Þ ¼ Dt 1 ðT s1 Þ Dt1 ðT s1 Þ þ Dt2 ðT s2 Þ dT s1 # 2 1 d Dt 1 ðT s1 Þ s 2 ðDt 2 ðT 2 ÞÞ þ þ 2! d2 T s1
Dt1 ðT s1 þ Dt 2 ðT s2 ÞÞ
The truncation error e for the above expression becomes ! ! 1 Dt 1 ðT s1 Þ 1 Dt 1 ðT s1 Þ s 2 e¼ D t ðT ÞÞ ð13Þ ð ðDt 2 ðT s2 ÞÞ3 þ 2 2 2! 3! t21 t31 Note that we compute the Dt 1 ðT s2 Þ backward from the second latch to the first latch. This approach provides a clear presentation of the relationships between time window and settling time. 1 The Taylor series gives f ðxÞ ¼ f ðaÞ þ ðf ðaÞ=1!ÞðxaÞ þ ðf ðaÞ=2!ÞðxaÞ2 þ ðaÞ=3!ÞðxaÞ3 þ where a is a neighborhood real number.
0
ðf
ð3Þ
00
162
T. Mak / Microelectronics Journal 43 (2012) 160–163
Fig. 2. Relationship between input time windows and their corresponding output between times for two cascaded latches.
In general, we have the metastability window of the first latch that results in an output failure for the nth latch case as 0 1 Pn T s =ti n 1 T w e Pn Y i ¼ 1 i s j s AðT w e i ¼ 1 T i =ti Þ Dt1 ðT n Þ @ ð14Þ n
tj
j¼1
We can compute the MTBF as the probability of an incoming signal arrived within the window in Eq. (14). Similarly, given the clock period of incoming data, Tc, and average incoming data period, Td, the MTBF for n latches becomes 0 1 Pn Pn ! s s n 1 Y tj e i ¼ 1 T i =ti A e i ¼ 1 T n =tn @ MTBF n T c T d ð15Þ Tw Tw j n j¼1 The MTBF derivation obtained is exactly the same as those presented in [4,7]. However, we noticed that the truncation error analysis was ignored in those works and, typically in Eq. (7), Dt2 ðT s2 Þ was assumed to be infinitesimal small, so that a simplified linear expression could be obtained. However, this linear approximation introduces a truncation error and this error accumulates throughout the computation for multiple stages of latches. We will present a thorough truncation analysis in Section 3.
3.1. Truncation error analysis The MTBF approximation involves a truncation error when using Taylor series expansion in Eq. (12). The approximation is illustrated in Fig. 3. The objective is to compute the time window tðT 1 ÞtðT 1 þ DtÞ. Because the function, f ðxÞ ¼ Dt 1 ðxÞ, is exponential and Dt is also an exponential function, the calculation involves a double exponential. A linear approximation can simplify the computation by taking the first derivative at T1 and, thus, the time window (t 0 ðT 1 þ DtÞ in Fig. 3) calculated using this approximation is larger than the true window (tðT 1 þ DtÞ in Fig. 3). We will now derive the relative error of the approximation. This error e equals the summation of the remaining terms of the Taylor series as shown in Eq. (13). We can have the relative error for a two-latch synchronizer case as follows: ! P1 ð1Þi Dt 1 ðT s1 Þ ðDt 2 ðT s2 ÞÞi i¼2 i! ti1 e ¼ ð16Þ Dt ðT Þ Dt ðT s Þ 2
1
Dt1 ðT s1 Þ
t1
Dt 2 ðT s2 Þ
Dt 1 ðT s2 Þ
Dt1 ðT s2 Þ 4
Dt 1 ðT s1 Þ
t1
Dt2 ðT ss Þ
Dt1 ðT s1 Þ 2t21
ðDt 2 ðT ss ÞÞ2
Dt1 ðT s1 Þ Dt2 ðT s2 Þ e t1 o 1 Dt ðT Þ Dt ðT s Þ Dt ðT s Þ 1 2 1 1 Dt2 ðT ss Þ 1 2 1 ðDt2 ðT ss ÞÞ2 t1 2t1
ð18Þ
To further simplify Eq. (18), we can have the error bound for
Dt1 ðT s2 Þ as follows: e Dt ðT Þ o 1 2
1 2t1 1 Dt2 ðT s2 Þ
ð19Þ
From the above equation, we can see that the error bound is related to the ratio between t1 and Dt 2 ðT s2 Þ. When t1 is two orders of magnitude larger than Dt 2 ðT s2 Þ, the error will be less than 1%. However, if t1 is small and the time window Dt 2 ðT s2 Þ is relatively large, as a result of short settling time T s2 , the error can be significant.
2
By substituting Eq. (11) into Eq. (16), we can obtain e Dt ðT Þ ¼ 1 2
From Eq. (11), we know that
by taking the first two terms of the Taylor series of Dt 1 ðT s2 Þ. Therefore, we have the following by substituting the inequality into Eq. (17):
3. Error computation
1
Fig. 3. Approximation error of window size using first derivative term in Taylor series.
1
ð17Þ
Example 1. Suppose we have two identical latches connected in series. Let the window Tw of the latch be 25 ns, the maximum allowed settling time Ts be 600 ps and t equal to 83 ps. The time window Dt 2 ðT s2 Þ becomes 18.7 ps and the relative error becomes 0.126, which is 12.6%. &
T. Mak / Microelectronics Journal 43 (2012) 160–163
3.2. Propagation error analysis Furthermore, if there are more than two latches connecting in series, the error will propagate and accrue throughout the MTBF computations. This is because the time window of the ði þ1Þ-th latch Dt i þ 1 , which contains an error ei þ 1 , is used in computing the time window Dt i using Eq. (11). Therefore, the error ei þ 1 for the ðiþ 1Þ-th latch propagates to the error ei . We derive an expression that shows the relationship between ei þ 1 and ei . Suppose we have n latches connected in series and the metastable window of the first latch that results the last latch to fail is denoted by Dt 1 ðT sn Þ. We also denote ei þ 1 for the absolute error when computing Dt i þ 1 ðT n Þ and let
Dbt i þ 1 ðT sn Þ ¼ Dti þ 1 ðT sn Þ þ ei þ 1
ð20Þ
t i þ 1 ðT sn Þ is the time window with a propagation error. where Db By substituting Eq. (20) into Eq. (17), we can obtain ei Dt ðT i
n
¼ Þ
Dt i ðT si Þ
ðDt i þ 1 ðT sn Þ þ9ei þ 1 9Þ
ti
1
Dti ðT sn Þ
ð21Þ
Since ei þ 1 is the truncation error for the i þ 1 stage computation and, we let 9e0i þ 1 9 ¼ 9ei þ 1 =Dt i þ 1 ðT n Þ9, where e0i þ 1 is the relative error of Dt i þ 1 ðT sn Þ. We can then rewrite Eq. (21) as 9e0i 9 ¼
Dt i ðT si Þ Dti þ 1 ðT sn Þð1 þ9e0i þ 1 9Þ
ti
1
Dti ðT sn Þ
ð22Þ
Following the steps in Eqs. (18)–(19), we can obtain the relative error bound e0i in terms of the propagation error e0i þ 1 , 9e0i 9 o
1 þ 9e0i þ 1 9
Dt i þ 1 ðT sn Þ
1
1
To further rearrange the terms, we can have the propagation error bound summarized as follows: 9e0i 9 o ðTruncationÞ þðPropagationÞ B 9e0i 9 o B @
1
0
ð24Þ 1
C B C 1 C þ 9e0 9B C iþ1 @ A 2ti Dti þ 1 ðT sn ÞA 1 1 Dt i þ 1 ðT sn Þ 2ti 1
truncation error is 0.026. So the overall relative error is 0.155 and is 15.5% for the MTBF approximation of the three latches. The truncated MTBF underestimates the metastability timing window and, thus, provides a pessimistic estimation of the synchronizer reliability. The resulting errors are mainly attributed to the ratio between the synchronizer settling time constant and the maximum allowed settling time. By increasing the maximum allowed settling time can significantly reduce the impact of errors in MTBF estimation. & 4. Conclusion Analytical expressions and derivations of MTBF for cascadedlatch synchronizer have been discussed. The expression is the same to that reported in [4,7] but a general expression for n cascaded latches are presented in this paper. Alternative general expression is given in [6], in which the clock period is also treated as a variable. A thorough truncation error analysis for MTBF computation is presented and error bounds for both the truncation and propagation errors are derived. We found that the error can contribute significantly to the MTBF approximation, especially, when t1 is small and the time window Dt 2 ðT s2 Þ is relatively large. Based on our analysis, we obtain a MTBF approximation rule of thumb, which is: If t1 is two orders of magnitude larger than Dt 2 ðT s2 Þ for a two-latch synchronizer, the error in the MTBF evaluation will be less than 1%. These results would be valuable for MTBF-based synchronizer characterizations.
Acknowledgment
ð23Þ
2ti
0
163
The author would like to thank Dr. Ian Jones, Dr. Suwen Yang, Prof. David Kinniment for their valuable comments to this work. The author would also like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of the paper. References
ð25Þ
From Eq. (24), we can see that the overall error is the sum of the time window truncation and the error, which propagates from the time window computation of the posterior stage. It is also interesting to note that the propagation error does not converge but will be amplified based on the ratio between ti and Dt i þ 1 ðT sn Þ. If ti is two orders of magnitude larger than Dt i þ 1 ðT sn Þ, the propagation error will not be increased by more than 1%. Example 2. Suppose we have three identical latches connecting in series. Using the same parameters from Example 1, we let the window Tw ¼25 ns, the maximum allowed settling time Ts ¼600 ps and t ¼ 83 ps. Also, we let n¼3, i¼1 in using Eq. (24). The timing window Dt 2 ðT s3 Þ becomes 4.19 ps and the propagation error from Dt 2 ðT s3 Þ is 0.129, and the new relative
[1] ASIC-World, What is Metastability? So How Do I Avoid Metastability? 2007 /http://www.asic-world.com/tidbits/metastablity.htmlS. [2] A. Cantoni, J. Walker, T.-D. Tomlin, Characterization of a flip flop metastability measurement method, IEEE Trans. Circuits Syst.—I: Regular Papers 54 (5) (2007) 1032–1040. [3] E. Dike, C. Burton, Miller and noise effects in a synchronizing flip-flop, IEEE J. Solid-State Circuits 34 (1999) 849–855. [4] T. Gabara, G. Cyr, C. Stroud, Metastability of cmos master/slave flip-flops, IEEE Trans. Circuits Syst.—II: Analog Digital Signal Process. 39 (10) (1992) 734–740. [5] R. Ginosar, Fourteen ways to fool your synchronizer, in: Proceedings of the 9th International Symposium on Asynchronous Circuits and Systems, 2003. [6] I.W. Jones, S. Yang, M. Greenstreet, Synchroniser behavior and analysis, in: Proceedings of the 15th IEEE International Symposium on Asynchronous Circuits and Systems, 2009. [7] D. Kinniment, Synchronization and Arbitration in Digital Systems, Wiley, 2007. [8] D. Kinniment, C. Dike, K. Heron, G. Russell, A. Yakovlev, Measuring deep metastability and its effect on synchronizer performance, IEEE Trans. VLSI 15 (9) (2007) 1028–1039. [9] S. Yang, M.R. Greenstreet, Simulating improbable events, in: Proceedings of DAC, 2007.