Signal Processing 84 (2004) 1307 – 1321
www.elsevier.com/locate/sigpro
Fixed-point fast Hartley transform error analysis G. Ghurumuruhan 1 , K.M.M. Prabhu∗ Department of Electrical Engineering, Indian Institute of Technology Madras, Chennai 600 036, India Received 27 February 2003
Abstract Deterministic models are used to determine the e.ect of quantization errors in radix-2 DIT–FHT and DIF–FHT algorithms. The results are then supplemented assuming a particular statistical model. Overall scaling and step-by-step scaling are treated independently for both DIT and DIF–FHT. The problem of twiddle factor quantization is also considered independently in both DIT and DIF–FHT algorithms. The expected average error power is derived for all the above-mentioned cases and comparisons are made with the FFT for each case. ? 2004 Elsevier B.V. All rights reserved. Keywords: FHT; Fixed-point error analysis; Overall scaling; Step-by-step scaling; Twiddle factor quantization; Deterministic models
1. Paper overview The discrete Hartley transform (DHT) was introduced by Bracewell [2,3] as a tool for the analysis of signals, in particular real signals. The DHT has fast algorithms similar in style to the fast Fourier transform (FFT), the ;rst of these was proposed by Bracewell [4]. Sorenson et al. [14] have put the subject of fast Hartley transform (FHT) on a ;rm footing by deriving results similar to many known e>cient FFT algorithms. Kwong and Shiu [9] proposed the structured FHT radix-2 algorithms and derived ∗ Corresponding author. Tel.: +44-2257-9368; fax: +44-22570509. E-mail addresses:
[email protected] (G. Ghurumuruhan), prabhu
[email protected] (K.M.M. Prabhu). 1 Presently with the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0250, USA.
the decimation-in-time (DIT) FHT and decimation-infrequency (DIF) FHT Gowgraphs similar in structure to their FFT counterparts. Fixed-point error analysis for the FFT was done in detail by Welch [19], where he showed that the input signal is required to satisfy certain constraints to prevent overGow in the ;xed-point processor. This is called the Overall Scaling (OS) scheme. An alternative to this scheme is the Step-by-Step scaling (SS) scheme [19], which overcomes the drawbacks of the OS scheme. Welch [19] obtained upper and lower bounds for error power in both these cases. Floating-point error analysis for the radix-2 FFT algorithms was done by Weinstein [18]. He also treated the problem of twiddle factor quantization for these algorithms and proved that the e.ect of coe>cient quantization on the output was much less when compared to the arithmetic quantization. Exact expressions for the error power in radix-2 FFT algorithms employing ;xed-point arithmetic was derived by Thong and Liu
0165-1684/$ - see front matter ? 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.sigpro.2004.04.010
1308
G. Ghurumuruhan, K.M.M. Prabhu / Signal Processing 84 (2004) 1307 – 1321
[16]. They considered a variance propagation model for their analysis. Zakhor and Oppenheim [21] were the ;rst to propose models for calculating the error variances in radix-2 DIT and DIF–FHT algorithms. They obtained results for both ;xed-point and Goating-point implementations of these algorithms. However, they considered only the OS and did not consider the SS [11,19]. Prabhu and Narayanan [13] obtained ;xed-point error analysis results for both DIT and DIF, considering the case of SS. They showed that the existing shifting scheme for FFT could not be used for FHT and hence modi;ed it for FHT. They also showed that with their model, FFT performed better than FHT with respect to error power. The disadvantages with all the above error propagation models were that they were all statistical in nature. For a long time, there was no deterministic analysis model for the radix-2 algorithms. Then, Yutai Ma [20] came up with a deterministic model for DIT– FFT and attempted to calculate the exact error power. However, the expressions he obtained (Theorems 8– 10 in [20]) were shown to be inaccurate by Adireddy and Prabhu [1]. The principal objective of this paper is to derive deterministic models to analyze the e.ect of arithmetic quantization errors in radix-2 FHT algorithms. In this paper, we propose accurate error analysis models for both DIT and DIF–FHT algorithms. Also, for each of them we consider the cases of OS and SS. Apart from this, the problem of coe>cient quantization (see [5–7,12,17]) in radix-2 FHT algorithms is treated in detail for the ;rst time and results obtained are shown to be exactly like those obtained by James [7] for FFT. Throughout this paper, we deal with real signals only. This is because the DHT is primarily used for the analysis of real signals. We present a deterministic analysis of our models and then supplement our results with the statistical averages, assuming a particular quantization scheme. Hence, we devote one part of this section to the statistical model we have assumed. It will also be shown that the deterministic results we present here can be used for any quantization scheme. Hence, the statistical model we have assumed in this paper is one of the possible choices. The organization of the paper is as follows: in the remaining part of this section, we ;rst state the problem. Then we describe a particular statistical model,
which we use for most part of the paper. Before considering the DIT–FHT and DIF–FHT individually, we discuss the problem of input quantization error, since its behavior does not depend upon the algorithm used. In Section 2, we study the e.ect of quantization errors in the DIT–FHT. Section 3 discusses the same for DIF–FHT. Finally, in Section 4, we give simulation results and in Section 5, the conclusions are drawn. List of symbols: N = 2
number of sample points considered for the FHT quantization step size in a (B + 1)-bit processor reverse of bit representation of a number i {(m; i; k) : 1 6 m 6 ; 0 6 i 6 2−m − 1; 0 6 k 6 2m − 1} k (mod 2m ) statistical expectation operator 1 if n ¿ 0, zero otherwise 1 if n ¿ 0, −1 otherwise cos(x) + sin(x).
= 2−B r(i) D (k)2 m
u(n) sgn(n) cas(x)
(A) Statement of the problem: The DHT of an N -point sequence h(n) is de;ned by [2], H (k) =
N −1
h(n) cas
n=0
2kn N
;
k = 0; 1; : : : ; (N − 1)
(1)
and h(n) can be obtained from (1) as h(n) =
N −1 2kn 1 ; H (k) cas N N k=0
n = 0; 1; : : : ; (N − 1):
(2)
When radix-2 algorithm is used, the FHT requires sequential computation of arrays. Due to the ;nite word length used, all the numbers are realized with ;nite bit accuracy and the results must be either rounded o. or truncated. This introduces errors at every stage of computation. Our objective in this paper is to ;nd how these errors propagate to the output of the radix-2 Gowgraph. Throughout this paper, we assume a ;xed (B+1)-bit processor with B-bits for data and one for the sign-bit. Also, we assume truncation of numbers unless
G. Ghurumuruhan, K.M.M. Prabhu / Signal Processing 84 (2004) 1307 – 1321
1309
otherwise mentioned. As will be shown later, the results in the paper can be used for rounding as well. (B) Statistical model: Throughout the paper, we will assume that two’s complement arithmetic is used to represent the fractions in the ;xed-point processor. It is well known [11] that the truncation of a real number to B-bits results in an error whose mean (T ) is −=2 and variance (T2 ) is 2 =12. Let qT =T2 +T2 . Also, error due to one-bit shift has a mean (1 ) of −=2 and variance (12 ) of 2 =4. Let q1 = 12 + 12 . It can be shown that error due to two-bit shift has a mean (2 ) of −3=2 and a variance (22 ) of 52 =4. Let q2 = 22 + 22 . (C) Input quantization error: When the numbers are to be stored in a B-bit processor, they are quantized to B-bits. This causes an error hQ (n). It can be easily seen that hQ (n) propagates to the output as N −1 2kn : hQ (n) cas HQ (k) = N n=0
Hence, using Parseval’s theorem we have the average error power as N −1 N −1 1 2 HQ (k) = h2Q (n): N n=0
k=0
As mentioned before, if we assume truncation we have the expected error power as N −1 1 2
HQ (k) = NqT : N k=0
We now treat the cases of DIT–FHT and DIF–FHT separately. We also note that the above analysis of input quantization error is independent of whether DIT or DIF is used. 2. Derivation of results for DIT–FHT A Gowgraph of the DIT–FHT of a 16-point real sequence is shown in Fig. 1(a). The stages are numbered as m=0; 1; 2; : : : ; , with m=0 being the input stage and m= being the output stage. The corresponding arrays are denoted by Hm (k), 0 6 m 6 , 0 6 k 6 N −1. We also denote the output array H (k) by H (k). From the Gowgraph, we can deduce some of its general properties. We ;nd that the ;rst two stages are devoid of any multiplication. Also, we ;nd that at every stage m, the
Fig. 1. (a) DIT–FHT Gowgraph. (b) DIF–FHT Gowgraph (Dotted lines indicate multiplication by −1). (c) Corresponding T blocks. Ci = cos(2i=2m ), Si = sin(2i=2m ).
array is divided into blocks of length 2m . The number of such blocks is 2−m . We enumerate the blocks as 0; 1; 2; : : : ; (2−m − 1) from the top. To see how the errors at intermediate stages propagate to the output of the DIT–FHT Gowgraph, we need to know how a single error (in the entire Gowgraph) at an intermediate stage travels to the output. For that, we need the following proposition.
1310
G. Ghurumuruhan, K.M.M. Prabhu / Signal Processing 84 (2004) 1307 – 1321
Proposition 2.1. In a DIT–FHT 4owgraph of an N -point sequence, there exists a 2m -point DHT relationship between any block at stage m (denoted m −1 by {H (m) (i; k )}k2 =0 ) and the corresponding input sequence h(n) given by m 2 −1 2k n (m) −m H (i; k ) = ; h(n2 + r(i)) cas 2m n=0
where H (m) (i; k ) (m; i; k ) ∈ D .
Hm (i2m + (k )m 2 ),
denotes
The above proposition relates the input sequence and the output at an ith block in the mth stage. Once we have this result, we can easily obtain how the errors propagate to the output. But the analysis di.ers slightly between OS and SS. Hence, we treat them separately in this paper. 2.1. Overall scaling To see how the error at any intermediate stage propagates to the output, we require the following result: Theorem 2.1.1. An error at point (m; i; k) ∈ D , denoted by H (m) (i; k), propagates to the set of points {k : 0 6 k 6 N − 1} of the output as 2−m −1 (m) H (k ) = H (i; k) (k − k − l2m ) 2k r(i) × cos N × sin
2k r(i) N
2 −1 1 (m) 2k n + r(i)) = m H (i; k ) cas 2 2m m
h(n2
−m
k =0
for some sequence h(n). Using (T.1) in the above equation, we get the input sequence due to the error as 1 2kn hk (n2−m + r(i)) = m H (m) (i; k) cas : 2 2m (T.2) Hence, the corresponding output sequence is given by H (k ) =
m 2 −1
n=0
hk (n2−m + r(i))
2k(n2−m + r(i)) × cas 2m
:
(T.3)
Using (T.2) and the fact that cas( + ) = cas() cos() + cas(−) sin(),we can rewrite (T.3) as 2k r(i) H (k ) = H (m) (i; k) Sm (k; k ) cos N 2k r(i) + Sm (k; −k ) sin ; (T.4) N where 2 −1 2k2 n 1 2k1 n Sm (k1 ; k2 ) = m cas : cas 2 2m 2m m
l=0
Using Proposition 2.1 and (2) we obtain
+
−m 2
(k + k − l2m )
l=1
:
Proof. Let an error occur (only) at a particular point (m; i; k) ∈ D . Denote the error by H (m) (i; k). Using the notation of Proposition 2.1, we obtain H (m) (i; k ) = H (m) (i; k)(k − k ):
(T.1)
We ;rst ;nd the corresponding input sequence due to the above error using Proposition 2.1. We then take the DHT of the resulting input sequence to get the e.ect of this error at the output.
n=0
+∞ It can be shown that Sm (k1 ; ak2 ) = l=−∞ (k1 + ak2 − l2m ), if a ∈ {−1; 1}. Using the above result and the fact that 0 6 k 6 (2 − 1) and 0 6 k 6 (2m − 1) in (T.4), we obtain the required result. The above theorem states that an error at stage m propagates to 2−m+1 points of the output. From this theorem, it is also interesting to note that for a point k at the output (0 6 k 6 N −1), utmost two points from every block at any stage contributes to the net error. It can be seen from the above result that if points in a block i at stage m are enumerated as 0; 1; : : : ; 2m − 1 from the top, then the points contributing to error at
G. Ghurumuruhan, K.M.M. Prabhu / Signal Processing 84 (2004) 1307 – 1321 m m point k of the output are i2m + (k)m 2 and i2 + (−k)2 . In fact, we have following corollary:
Corollary 2.1.2. Given an output point k(0 6 k 6 N − 1), the contribution of error at that point due to stage m(1 6 m 6 ) and block i(0 6 i 6 2−m −1) is given by 2kr(i) H (k) = H (m) (i; k) cos N 2kr(i) + H (m) (i; −k) sin N The above corollary gives the error contribution of block i at stage m to the output. One needs to just sum over all the stages and blocks in each stage to get the net error. Lemma 2.1.3. In OS, the net error at k, 0 6 k 6 N − 1, due to all stages is given by 2−m −1
H (k) =
H
m=s(k)
(m)
i=0
2ki (r(i); k) cos N
+ H (m) (r(i); −k) sin where k =
i=1
2ki N
Tracing the Gowgraph from the kth output to the input, we note that multiplications (other than ±1) are required upto stage s(k) as de;ned above (see [16]). With the above lemma, we can obtain the average error power due to quantization after a slightly more involved analysis (see Appendix A). Theorem 2.1.4. In OS, the average error power at the output due to quantization is given by N −1 1 2 1 H (k) = N 2m m1 =3 m2 =3
k=0
(2−m −1) (2m −1)
i=0
l=0
{Ai + Bi }
where
Ai =
2m−m −1
H (m1 ) (r(i1 ); l)H (m2 ) (r(i2 ); l) cos
r=0
and
Bi =
m−m 2
H
(m1 )
(r(i1 ); l)H
(m2 )
=1
2lr : (r(i2 ); −l)sin 2m
m = max(m1 ; m2 ); m = min(m1 ; m2 ); i1 = r2−m2 u(m2 − m1 ) + i sgn(m1 − m2 ) i1 = r2−m2 u(m2 − m1 ) + i i2 = r2−m1 u(m1 − m2 ) + i sgn(m2 − m1 ) and i2 = r2−m2 u(m1 − m2 ) + i: The above theorem is very general, in the sense that no assumption was based on the quantization model used for implementation. We now consider truncation of numbers and give the expected value of average error power. To ensure that no overGows occur at any stage of computation, it can be shown that the input signal h(n) is constrained by (see [19,21]), |h(n)| ¡ 1=N:
ki 2i−1 and s(k) = 2 + min{j : kj = 1}.
2lr 2m
1311
(3)
In other words, (3) is required to ensure that |Hi (k)| ¡ 1 for 1 6 i 6 . Theorem 2.1.5. In OS, the expected average error power for truncation is given by N −1 1 2
HT (k) N k=0
8 N + −6 N 25 4 − 9 4(6 + 5) 2 : + NT − − 72 2N 9N 2
qT = 6
From the above theorem, we note that for very large values of N , the expected error power is N 412 =72 ∼ = 0:57N2 . It must also be stated that Theorem 2.1.5 was derived after certain approximations (see Appendix B for details). If one wants the results for rounding, one just needs to substitute T =0 and replace qT by qR =2 =12. From the above theorem, we ;nd that the error power is
1312
G. Ghurumuruhan, K.M.M. Prabhu / Signal Processing 84 (2004) 1307 – 1321
an O(N ) multiple of 2 . However, the primary disadvantage of this scheme is that the signal magnitude is unnecessarily constrained by (3), which is very harsh. Hence we go in for the SS [19,21] which relaxes the constraint (3), but in turn introduces more error. The SS scheme is discussed below. 2.2. Step-by-step scaling To derive an SS scheme, we ;rst need to know the maximum number (Mm ) by which any element at the output of a stage m gets multiplied, before it reaches the next stage. It can be shown [13] that Mm = 2 for the ;rst two√ stages (m = 1 and 2) and for m ¿ 2, Mm = (1 + 2). Hence, |Hi (k)|max 6 (1 +
√
2)|Hi−1 (k)|max ;
3 6 i 6 : (4)
Therefore, unlike in FFT, a shift of just one bit per stage would not su>ce. To overcome this problem, Prabhu and Narayanan [13] have proposed a shifting scheme in which almost all the shifting is done as a two-bit shift. The disadvantage with their scheme is that many two-bit shifts can cause a lot of error at the output during the initial stages. To lower these undesired e.ects, we propose a “spread” shifting scheme in which we do not do all the two-bit shifts initially. We start with one-bit shifts, and if (4) is not satis;ed, we resort to a two-bit shift. This approach is seen to improve the performance of FHT, with respect to average error power. In fact, for certain blocklengths, FHT yields a superior performance when compared to FFT (see Section 4). In Table 1, we give the scaling factor (sm ) for each stage m, along with the maximum signal magnitudes possible at that stage. From the table, we ;nd that the maximum possible signal magnitude for stage 13 is the same as that for stage 3. Hence our shifting scheme follows the same pattern from m = 14 onwards, i.e., sm = sm−10 m ¿ 13. Therefore, we give the scaling factors for and upto m = 13 only. Our shifting scheme requires that the input signal h(n) is bounded by |h(n)| ¡ 0:5. Also, our shifting algorithm (as can be seen from Table 1) can be stated as: If (m ¿ 1 and m ≡ 1 (mod 4)) shift (the entire array) by two-bits, else shift (it) by one-bit: We see that an error at stage m gets multiplied by a(m) at the
Table 1 Maximum possible signal magnitudes at every stage in a DIT–FHT Gowgraph Stage (m)
Shift factor (sm )
Max. possible magnitude
0 1 2 3 4 5 6 7 8 9 10 11 12 13
1=2 1=2 1=2 1=2 1=2 1=4 1=2 1=2 1=2 1=4 1=2 1=2 1=2 1=4
0.5 0.5 0.5 0.604 0.728 0.879 0.640 0.773 0.933 0.563 0.68 0.821 0:99 ≈ 1 0.604
output, where a(m) is de;ned as m 2 ; 16m65 a(m) = 2m+1 ; 6 6 m 6 9 : m+2 2 ; 10 6 m 6 13
(5)
In (5), we have assumed 6 13 and henceforth we continue to do so. Though for our statistical analysis we use a(m) as de;ned above, the advantage of our model is that it can be used to ;t any shifting scheme. One needs to de;ne a(m) appropriately and proceed to ;nd the average error power. If we now modify Lemma 2.1.3 to include the factor a(m) we obtain. Lemma 2.2.1. In SS, the net error at k, 0 6 k 6 N − 1, due to all stages is given by H (k) =
a(m)
m=s(k)
+H
(m)
2−m −1
H (m) (r(i); k) cos
i=0
2ki (r(i); −k) sin N
2ki N
:
From the above lemma, we get an expression for average error power similar to Theorem 2.1.4.
G. Ghurumuruhan, K.M.M. Prabhu / Signal Processing 84 (2004) 1307 – 1321
Theorem 2.2.2. In SS, the average error power due to quantization is given by 1 N
N −1 k=0
a(m1 )a(m2 ) H 2 (k) = 2m m1 =3 m2 =3
(2−m −1) (2m −1)
×
i=0
l=0
{Ai + Bi };
where the notations are as de:ned in Theorem 2.1.4. Before evaluating the expected average error power, we need to see if there is any correlation between errors at di.erent stages. Truncation errors are independent in the statistical sense. In the case of DIT, it can be shown that the errors caused due to bit-shifts are also uncorrelated and this has been veri;ed using simulations (see Section 4). We now give the result for the expected value of average error power due to truncation with the shifting scheme as de;ned by (5). Theorem 2.2.3. In SS, for truncation, the expected average power is given by
N −1 1 2 HST (k) N
k=0
5 4N2 4N − −4 + CROSS1 3 6 65 3 3 5 16N2 2N − −3:5 + CROSS2 6 6 69 = 3 3
=
In the above expression, we have restricted ourselves upto = 9. But we can obtain the expressions for ¿ 9 also. The derivation is cumbersome but straightforward. In the above expression, CROSS1 and CROSS2 represent the contributions from the cross terms. The cross terms can be derived in a similar manner, as in Appendix B. The derivation is tedious and lends little insight. From the above result, we see that the error terms are O(N 2 ) multiples of 2 . Hence, we expect the error power to be higher than that in the case of OS. This has been veri;ed experimentally (Section 4).
1313
2.3. Coe;cient quantization The problem of twiddle factor quantization, which has been treated extensively for FFT [5–7,12,17], has not been done for FHT so far. In this section, we analyze the e.ect of twiddle factor quantization in radix-2 FHT algorithms and compare its performance with the FFT. For that purpose, we assume the signal h(n) and all products in Fig. 1(a) are represented with in;nite precision. The only error caused is due to twiddle factor quantization. Hence, instead of obtaining H (k) as de;ned by (1) we obtain N −1 2kn ˆ ; h(n)cas ˆ H (k) = N n=0
where cas(2kn=N ˆ ) denotes the net e.ect of quantization. To exactly de;ne cas(2kn=N ˆ ), we ;rst de;ne the numbers ki = (kni 2i−1 ) mod N; i = 1; 2; : : : ; ;
where n = i=1 ni 2i−1 . Then, since 2 2kn = cas ki ; cas N N
(6)
i=1
we obtain (see [10]), 2kn = S 0 + S 1 − S2 − S3 + S 4 : : : ; cas N
(7)
where if j = 2kj =N then, cos j ; S0 = j=1
S1 =
sin j
j=1
S2 =
cos i ;
i=1 i=j
j=1
k=1 j=k
sin k sin j
cos i ; etc:
i=1 i=j i=k
We immediately see that (7) precisely describes the “route” taken by h(n) to reach H (k). In other words, in the ideal case, if we consider a particular value of n, the corresponding h(n) as it traverses to the output, gets multiplied by coe>cients as de;ned by (7). Hence, if we quantize each of the coe>cients in (7),
1314
G. Ghurumuruhan, K.M.M. Prabhu / Signal Processing 84 (2004) 1307 – 1321
we get the e.ect of twiddle factor error at point k at the output due to point n at the input. Therefore, replacing the coe>cients in (7) with their quantized values we obtain 2kn = Sˆ0 + Sˆ1 − Sˆ2 − Sˆ3 + Sˆ4 : : : ; (8) cas ˆ N where Sˆ0 = j=1 [cos j ], etc. and [x]Q represents the quantized value of x. So we have the twiddle factor error HTW (k) as HTW (k) = Hˆ (k) − H (k) =
N −1
h(n)ek (n);
n=0
where
2kn ek (n) = cas ˆ N
2kn − cas N
Proof. We obtain HTW (k) =
N −1
Squaring and taking the expectation operator on both sides, we obtain 2
{HTW (k)} = qH
:
(9)
Though the errors are not strictly statistical in nature (see [5]), the deterministic analysis of error power is quite involved and lends little insight. So we have gone straightaway for statistical analysis. We also assume that signal h(n) is white and uniformly distributed in [ − 0:5; 0:5] and hence var(h(n)) = qH = 1=12. Also, we assume that the coe>cients [cos j ]Q and [sin k ]Q are rounded and hence ek (n) has zero mean. Regarding the variance of ek (n), we assume the following proposition:
N −1
{ek2 (n)};
n=0
since the signal is assumed to be white. So, N −1 N −1 N −1 1 2 qH
HTW (k) =
{ek2 (n)}: N N k=0
h(n)ek (n)
n=0
n=0 k=0
Using Proposition 2.3.1,
N −1the RHS oftnthe above equation reduces to qH n=0 [(1 + 2qR ) − 1]. Since for 4(−2 Ck ) numbers between 0 and N − 1, tn = k, we obtain N −1 1 2 HTW (k)
N k=0
= qH
−2
4−2 Ck {(1 + 2qR )k − 1}
k=1
= NqH [(1 + qR )−2 − 1]:
Proposition 2.3.1. ek (n), as de:ned by (9), has the following variance:
Since qR 1, we use binomial theorem to get the result.
var(ek (n)) = (1 + 2qR )tn − 1;
For = 3, it can be easily derived that the error power is 4qH qR . In fact, the very same expression given in Theorem 2.3.2 was obtained by James [7] for FFT. Hence, we conclude that the FHT and FFT algorithms perform in the same fashion only due to coe>cient quantization. Simulation results are discussed in Section 4.
where 0 6 n 6 2 − 1, 0 6 k 6 2 − 1 and tn is the number of non-zero bits in the last ( − 2) places of binary representation of n. For the above proposition, we consider only the last (−2) bits because, the ;rst two-bits do not contribute to error. Once we have Proposition 2.3.1, we can derive the expected error power as Theorem 2.3.2. Due to coe;cient rounding, the expected average error power for ¿ 3 is given by N −1 1 2
HTW (k) = qH qR N ( − 2): N k=0
3. Derivation of results for DIF–FHT A Gowgraph for DIF–FHT is shown in Fig. 1(b). From the Gowgraph, we ;nd that a straightforward DHT relationship exists between points of a block at any stage and the corresponding output section. Hence, we have the following proposition.
G. Ghurumuruhan, K.M.M. Prabhu / Signal Processing 84 (2004) 1307 – 1321
Proposition 3.1. The relationship between any block at stage i and the corresponding output section is m
H (i + k2 ) =
2−m −1
H (m) (i; l) cas
l=0
From Proposition 3.1, we can easily obtain the error contribution due to a particular stage m. Adding the contributions due to all the stages, we have the following lemma: Lemma 3.1.1. The error at a point k of the output of DIF–FHT due to all stages is given by −m
m=1
H (m) (k; l) cas
l=0
2kl N
:
Note that in the above theorem, we consider stages m = 1; : : : ; ( − 2) only, since the last two stages have multiplications only by ±1 and hence introduce no error. From the above lemma, we get a result for average error power. Theorem 3.1.2. In OS, the average error power is given by N −1 −2 −2 1 2 1 H (k) = N 2m k=0
m1 =1 m2 =1
Ai =
2m−m −1
H (m1 ) (p; l1 )H (m2 ) (p; l2 ) cos
r=0
m−m 2
H (m1 ) (p; l1 )H (m2 ) (p; l2 ) sin
r=1
2pr 2m
2pr 2m
and l1 , l2 , l1 and l2 follow identical de:nitions of i1 , i2 , i1 , and i2 , respectively, as given in Theorem 2.1.4. From this we can obtain the expected error power for truncation. Again if (3) is satis;ed, we are ensured that no overGows occur. Theorem 3.1.3. In OS, the expected average error power for truncation is given by N −1 1 2 N − + 2NT2 C;
HT (k) = qT N 2 k=0
where
3.1. Overall scaling
H (k) =
Bi =
Proof. From Fig. 1(b), we ;nd that at every stage m, only one block contributes to the output at k. That block is given by the index number r((k)m 2 ), if the blocks are enumerated as 0; 1; : : : ; (2m − 1) from the top. Using this and (1) we obtain the result. Intuitively, due to the above proposition, we expect the analysis of DIF–FHT to be simpler. Since unlike in DIT, where we ;nd the corresponding input sequence and then take its DHT, a straightforward relationship exists here. As before, we treat OS and SS separately.
−2 2 −1
where
2l (i + k2m ) N
where H (m) (i; l) denotes H (m) (r((i)2m )2−m + l) and (m; k; i) ∈ D .
1315
(2−m −1) (2m −1)
l=0
p=0
{Ai + Bi }
−2 m−1 1 C= 4m m=2 m =1
m 2 −1
2m−m −1
p=2m −2m −1
r=0
2pr cas 2m
:
It can be shown that for a very large N , C is ap√ proximately upper bounded by 32 + 16 . In the above expression, the ;rst term denotes the contributions due to auto-terms and the second term, due to cross components (see Appendix B). It is interesting to note that the range of summation of p in the second term is only from 2m −2m −1 to 2m −1. This can be explained as follows: It can be seen from Fig. 1(b) that at any stage m, truncation error occurs at a block p(0 6 p 6 2m − 1) only if p is odd. Hence,
{HT(m1 ) (p; l1 )HT(m2 ) (p; l2 )} = T2
[1 − (−1)r((p)2m ) ][1 − (−1)r((p)2m ) ] ; 4
where m = max(m1 ; m2 ), m = min(m1 ; m2 ) and (m1 ; l1 ; p) ∈ D , (m2 ; l2 ; p) ∈ D . But r((p)2m ) and r((p)2m ) are simultaneously odd only if 2m − 2m −1 6 p 6 2m − 1. From Theorem 3.1.3, we ;nd that error power here is again an O(N ) multiple of 2 .
1316
G. Ghurumuruhan, K.M.M. Prabhu / Signal Processing 84 (2004) 1307 – 1321
3.2. Step-by-step scaling
where
It can be √ shown that a number gets multiplied utmost by 2 2 as it goes to the next stage [13]. Hence, we propose the following shifting scheme for 0 6 m 6 ( − 1): If (m ≡ 0 (mod 2)), shift (the entire array) by two bits else shift (it) by one bit: It can be shown that no overGow occurs at any stage provided |h(n)| ¡ 1. It can also be shown that an error at stage m gets multiplied by a(m) = 2m+ceil(m=2) at the output of the DIF–FHT Gowgraph. Hence, by modifying Lemma 3.1.1 and Theorem 3.1.2 we obtain Lemma 3.2.1. In SS, the net error at an output point k due to all stages is given by H (k) =
−1
a(m)
2−m −1
m=1
H (m) (k; l) cas
l=0
2kl N
:
Theorem 3.2.2. In SS, the average error power due to quantization is given by N −1 −1 −1 1 2 a(m1 )a(m2 ) H (k) = N 2m m1 =1 m2 =1
k=0
(2
×
−m
m
−1) −1) (2 {Ai + Bi }; p=0
l=0
where the notations used here are the same as in Theorem 2.2.2. We now consider the case of truncation to ;nd the expected average power. Once again, we ;nd that the errors caused due to bit-shifts are uncorrelated (see Section 4). We have directly substituted the values of the mean and variance and obtained the ;nal result. Theorem 3.2.3. In SS, the expected average truncation error is given by
1 N
N −1
2 HST (k)
k=0
=
2
(AN 3 −40N +B)+CROSS 6
A=
8:25;
even
3:52;
odd
;
B=
16:25;
even
51:8;
odd
m
;
m−m
−1 m−1 −1 2 −1 a(m)a(m ) 2 CROSS = N 4m m=2 m =1
p=0
r=0
× Sm; m (p; m)Sm; m (p; m ) cas
2pr 2m
and Sj; k (p; m) 1 ; 2 ; = 1 + T ; 2 + T ;
0 6 p 6 2j − 2k−1 − 1 0 6 p 6 2j − 2k−1 − 1
m even m odd
2j − 2k−1 6 p 6 2k − 1 m even j k−1 k 2 −2 6 p 6 2 − 1 m odd
:
From the above expressions, we ;nd that the error power is an O(N 3 ) multiple of 2 . Hence, we expect its performance to be worse than that of the FFT. This has been veri;ed experimentally (see Section 4). 3.3. Coe;cient quantization The analysis is very much like that in the case of DIT. The only di.erence here is that instead of numbers {ki }, as de;ned by (6), we de;ne numbers {ni } as ni = (nki 2i−1 ) mod N; i = 1; 2; : : : ; ;
where k = i=1 ki 2i−1 . Then, it can be shown that for the ideal case, (7) again describes the “route” taken by n to “reach” k provided we rede;ne j as j =2nj =N . Since the statistical properties are the same, we get exactly the same results as in the DIT case. Hence, we ;nd that the e.ect of coe>cient quantization is the same for the DIT and the DIF algorithms. 4. Simulations and results 4.1. Correlation between errors at di
G. Ghurumuruhan, K.M.M. Prabhu / Signal Processing 84 (2004) 1307 – 1321
1317
Fig. 2. (a) Part of the basic butterGy in DIT–FHT. e1 and e2 are truncation errors, s1 , s2 , s3 , s4 and s5 are errors caused due to bit-shifts. (b) Part of the basic butterGy in DIF–FHT.
due to bit-shifts need not be [15]. Later, we also stated in di.erent sections that both in DIT–FHT and DIF –FHT, the shifting errors were found to be uncorrelated. This was veri;ed by implementing part of the basic butterGies for the respective algorithms [21]. We gave random inputs uniformly distributed between −0:5 and 0.5 to the inputs a, b, and c and the branches labelled cos and sin in Figs. 2(a) and (b). We repeated the experiment 100,000 times and compared the average value of s1 s2 with {s1 } {s2 } and the di.erence was found to be less than 1% in each case. So we concluded that the errors due to bit-shifts were uncorrelated in both the DIF–FHT and the DIT–FHT.
Fig. 3. Average truncation error power for (a) DIT–FHT; (b) DIF–FHT.
4.2. Evaluation of results Computer programs that simulate the twelve cases of radix-2 ;xed-point FHT algorithms and the eight cases of radix-2 ;xed-point FFT algorithms were written in MATLABJ . The results presented in this section are for 24-bit data (B = 24). We have used real input data which are uniformly distributed (the range of data depends on the constraints mentioned in the respective sections). It must also be stated that our results are valid only under the assumption that the input signal is white. Kaneko and Liu [8] have derived detailed formulae when the input signal is not white.
Fig. 3(a) compares the error power for OS and SS for the DIT–FHT. As expected, we ;nd that OS introduces less error than SS. Seeing Fig. 3(b), the same can be said about the DIF–FHT as well. This is because in SS, an error at any stage of computation gets multiplied by a certain factor depending on the stage. But in OS, no such multiplication takes place. Hence, it introduces less error. In Fig. 4, we have plotted the error power caused due to twiddle factor quantization only. Since it was proved that the performance was the same for both
1318
G. Ghurumuruhan, K.M.M. Prabhu / Signal Processing 84 (2004) 1307 – 1321
comparison of DIT algorithms
Average Truncation Error (B = 24)
10
10
10
10
-13
-14
-15
-16
2
3
4
5
(a)
6
7
8
9
10
9
10
nu = log2(N)
comparison of DIT algorithms 10
DIF and DIT, we have just plotted only one graph. For = 3, the expression given by Theorem 2.3.2 is not valid and therefore we ;nd a sudden break as can be seen from Fig. 4. Looking at Figs. 5(a) and (b), we ;nd that the performance of FHT is better than FFT when we consider the case of OS. This is because of the fact that FHT is inherently real. In the case of SS, seeing Fig. 5(b), we ;nd that the DIT–FHT is better than DIT–FFT upto = 5. For = 6, the performance of both DIT– FHT and DIT–FFT are almost the same. Beyond that, the performance of FFT is better. This is an improvement over the previous results reported by Prabhu and Narayanan [13], where they used a “lumped” shifting scheme, which resulted in the signal going to very low values causing a lot of error. They also mentioned that if one used a “spread” shifting scheme, the performance of FHT could be improved. The reason for this is that by spreading the shifting scheme, we can eliminate many unwanted two-bit shifts. This was veri;ed experimentally using our model. The shifting scheme, as de;ned by (5), hardly uses two-bit shifts and hence is comparable in performance to that of the FFT. Since we do a lot of two-bit shifts in the case of DIF–FHT, we expect its performance to be worse than
Average Truncation Error (B = 24)
Fig. 4. Average quantization error for DIT and DIF–FHT.
10
10
10
10
10
10
-7
-8
-9
-10
-11
-12
-13
2
3
(b)
4
5
6
7
8
nu = log2(N)
+ + FFT
FHT
Fig. 5. Comparison of error power between DIT–FHT and DIT– FFT for (a) OS; (b) SS.
the performance of DIF–FFT. This was veri;ed experimentally (see Fig. 6(b)). 5. Conclusions Fixed-point FHT error analysis has been treated on a deterministic basis. Results obtained showed conclusively that FHT is better than FFT in certain aspects.
G. Ghurumuruhan, K.M.M. Prabhu / Signal Processing 84 (2004) 1307 – 1321
Appendix A.
comparison of DIF algorithms
Average Truncation Error (B = 24)
10
-13
In this appendix we derive the relation presented in Theorem 2.1.4. Derivations of Theorems 2.2.2, 3.1.2 and 3.2.2 follow in a similar manner. 10
10
-14
Proof of Theorem 2.1.4. Squaring both sides of Lemma 2.1.3 and summing over the entire range of k we obtain N −1
-15
H (k) =
N −1
1 −1 2−m2 −1 2−m
k=0 m1 =s(k) m2 =s(k)
i1 =0
i2 =0
× (X1 C1 + Y1 S1 )(X2 C2 + Y2 S2 ); (A.1) -16
2
3
4
(a)
5
6
7
8
9
10
nu = log2(N) comparison of DIF algorithms 10
Average Truncation Error (B = 24)
2
k=0
10
10
10
10
10
-6
-7
where we have taken the temporary notations 2kij ; Xj = H (mj ) (r(ij ); l); Cj = cos N 2kij (mj ) Yj = H (r(ij ); −l); Sj = sin N for j = 1; 2. Let k = l + p2m , 0 6 p 6 2−m − 1, 0 6 l 6 2m − 1, and m = max(m1 ; m2 ). Then RHS of (A.1) can be rewritten 1 −1 2−m2 −1 2−m 2−m −1 X1 X2 C1 C2
-8
-9
-10
m1 =3 m2 =3
10
10
i1 =0
i2 =0
10
p=0
-11
+Y1 Y2
-12
2−m −1
S1 S2 + X 1 Y 2
p=0 -13
2
(b)
1319
3
4
5
6
7
8
9
10
+ X2 Y1
nu = log2(N)
2−m −1 p=0
+ + FFT
C2 S1
2−m −1
C 1 S2
p=0
:
(A.2)
FHT
Fig. 6. Comparison of error power between DIF–FHT and DIF– FFT for (a) OS; (b) SS.
The idea that FHT always performed worse than FFT when implemented using SS has been proved to be incorrect by de;ning the shifting scheme appropriately. As mentioned before, the statistical model discussed here is only one possible choice. We can de;ne a suitable shifting scheme for a particular application and analyze the e.ect of quantization due to that scheme.
In the above expression, we have assumed that for a given k, the value within the brackets is zero outside the range s(k) 6 m1 6 and s(k) 6 m2 6 . It can be shown that, 2−m −1
C1 C2 = 2−m−1 (C− + C+ )
and
p=0 2−m −1 p=0
S1 S2 = 2−m−1 (C− − C+ );
1320
G. Ghurumuruhan, K.M.M. Prabhu / Signal Processing 84 (2004) 1307 – 1321
2−m −1
C1 S2 = 2−m−1 (S+ − S− )
Theorem 2.1.4 gives us N −1 1 2 H (k)
N
and
p=0 2−m −1
k=0
C2 S1 = 2−m−1 (S− + S+ );
m
−1) 1 (2
{Ai + Bi }; = 2m i i m m
p=0
1
where
C− =
2m−m −1
(i1 − i2 − r2−m ) cos
r=0
2lr 2m
C+ =
(i1 + i2 − r2−m ) cos
r=1
2lr 2m
:
(2 −1)
(X1 X2 − Y1 Y2 ) =
l=0
(X2 Y1 − X1 Y2 ) = 0;
l=0
(2m −1)
(2m −1)
(X1 X2 + Y1 Y2 ) = 2
l=0
X 1 X2
l=0
and (2m −1)
(X1 Y2 + X2 Y1 ) = 2
l=0
(2m −1)
where the range of summations is as in Theorem 2.1.4. Splitting the summation into two parts (m1 =m2 ; m1 = m2 ), RHS of (B.1) can be written as
m=3
m
(2 −1)
(B.1)
l=0
2
m
S− and S+ are obtained by replacing cos by sin in C− and C+ , respectively. Therefore, the term within the brackets in (A.2) can be rearranged as C− (X1 X2 + Y1 Y2 ) + C+ (X1 X2 − Y1 Y2 ) + S− (X2 Y1 − X1 Y2 ) + S+ (X1 Y2 + X2 Y1 ). It can be seen that m
1
(2 −1) N qT fm (l) 4m
and m−m 2
2
X 1 Y2 :
l=0
Hence, the term within brackets in (A.2) reduces to 2−m (X1 X2 C− + X1 Y2 S+ ). By removing the delta functions in C− and S+ and dividing both sides of (A.1) by N , we obtain the result.
l=0
m−m
m
m−1 −1 2 −1 N 2 +2 4m m=4 m =3
l=0
× T2 fm (l)fm (l) cas where
fm (l) = 0
r=0
2lr 2m
In this appendix, we derive the expected average power expression as given by Theorem 2.1.5. Taking the expectation operator on both sides of
;
(B.2)
0;
(l)2m = 0; 2m−2 ; 2m−1 or 3:2m−2 ;
1;
otherwise
m = max(m1 ; m2 ) and m = min(m1 ; m2 ). The reason fm (l) has been introduced so that, in each block at any stage m(¿ 3), the four points as given above in the de;nition of fm (l) do not cause any error. The ;rst term in (B.2) reduces to (NqT =6)(N + (8=N ) − 6). Now since, fm (l)fm (l) = fm (l), the second term in (B.2) can be written as m 2 m−1 −1 2m−m 1 2 −1 2lr 2 : f (l) cas 2T N m 4m 2m m=4 m =3
l=0
r=0
The innermost double summation has no closed-form expression. It can be approximated in the following way: m 2 −1 2m−m −1 2lr fm (l) cas 2m l=0
r=0
Appendix B.
=
m 2 −1 2m−m −1
fm (l) cas
l=0
r=1
2lr 2m
m
≈ (2 − 4:2
m−m
)+
m 2 −1 2m−m −1
l=0
r=1
+(2m −4)2m−m
G. Ghurumuruhan, K.M.M. Prabhu / Signal Processing 84 (2004) 1307 – 1321
2lr × fm (l) cas 2m
= (2m − 4:2m−m ) + (−2m−m + 4)
= 2m − 5:2m−m + 4: Summing up the second term now gives the result.
References [1] S. Adireddy, K.M.M. Prabhu, A note on ‘an accurate error analysis model for fast Fourier transform’, IEEE Trans. Signal Process. 47 (6) (June 1999) 1743–1744. [2] R.N. Bracewell, Discrete Hartley transform, J. Opt. Soc. Amer. 73 (12) (December 1983) 1832–1835. [3] R.N. Bracewell, Fast Hartley transform, Proc. IEEE 72 (8) (August 1984) 1010–1018. [4] R.N. Bracewell, The Hartley Transform, Oxford University Press, Oxford, 1985. [5] U. Huete, Results of a deterministic analysis of FFT coe>cient errors, Signal Processing 3 (1981) 321–333. [6] U. Heute, H.W. Schussler, FFT-accuracy—new insights and a new point of view, IEEE ICASSP 2 (1983) 631–634. [7] D.V. James, Quantization errors in the fast Fourier transform, IEEE Trans. Acoust. Speech Signal Process. ASSP-23 (3) (June 1975) 277–293. [8] T. Kaneko, B. Liu, Accumulation of roundo. error in fast Fourier transform computation, J. Assoc. Comput. Mach. 17 (October 1970) 637–654. [9] C.P. Kwong, K.P. Shiu, Structured FHT algorithms, IEEE Trans. Acoust. Speech Signal Process. ASSP-34 (4) (August 1986) 1000–1002.
1321
[10] S.L. Loney, Plane Trignometry, S. Chand, 1996, pp. 131– 134 (Chapter 9). [11] A.V. Oppenheim, R.W. Schafer, Digital Signal Processing, Prentice-Hall, New York, 1975, pp. 329–330 (Chapter 6). [12] A.V. Oppenheim, C.J. Weinstein, E.ects of ;nite register length in digital ;ltering and the fast Fourier transform, Proc. IEEE 60 (8) (August 1972) 957–976. [13] K.M.M. Prabhu, S.B. Narayanan, Fixed-point error analysis of fast Hartley transform, Signal Processing 19 (1990) 191–198. [14] H.V. Sorenson, et al., On computing the discrete Hartley transform, IEEE Trans. Acoust. Speech Signal Process. ASSP-33 (4) (October 1985) 1231–1238. [15] M. Sundaramurthy, V.U. Reddy, Some results in ;xed-point fast Fourier transform error analysis, IEEE Trans. Comput. C-26 (3) (March 1977) 305–308. [16] T. Thong, B. Liu, Fixed-point fast Fourier transform error analysis, IEEE Trans. Acoust. Speech Signal Process. ASSP-24 (6) (December 1976) 563–573. [17] D.W. Tufts, et al., E.ects of FFT coe>cient quantization on bin frequency response, Proc. IEEE 60 (January 1972) 146–147. [18] C.J. Weinstein, Roundo. noise in Goating-point fast Fourier transform computation, IEEE Trans. Audio Electroacoust. AU-17 (September 1969) 209–215. [19] P.D. Welch, A ;xed-point fast Fourier transform error analysis, IEEE Trans. Audio Electroacoust. AU-17 (June 1969) 151–157. [20] Yutai Ma, An accurate error analysis model for fast Fourier transform, IEEE Trans. Acoust. Speech Signal Process. 45 (June 1997) 1641–1645. [21] A. Zakhor, A.V. Oppenheim, Quantization errors in the computation of discrete Hartley transform, IEEE Trans. Acoust. Speech Signal Process. ASSP-35 (11) (November 1985) 1592–1602.