Fixed-point error analysis and an efficient array processor design of two-dimensional sliding DFT

Fixed-point error analysis and an efficient array processor design of two-dimensional sliding DFT

Signal Processing 73 (1999) 191—201 Fixed-point error analysis and an efficient array processor design of two-dimensional sliding DFT Yisheng Zhu *,...

191KB Sizes 0 Downloads 54 Views

Signal Processing 73 (1999) 191—201

Fixed-point error analysis and an efficient array processor design of two-dimensional sliding DFT Yisheng Zhu *, Hong Zhou, Hong Gu , Zhizhong Wang Department of Biomedical Engineering, Shanghai Jiao Tong University, 200030, People+s Republic of China  Department of Electronic Engineering, University of Science & Technology of China, 230027, People+s Republic of China Received 3 July 1998; received in revised form 28 September 1998

Abstract Two-dimensional (2-D) sliding discrete Fourier transform (DFT) algorithm can realize sliding spectrum analysis and real-time signal processing. In this paper, its fixed-point error analysis is carried out to form a theoretical basis for hardware implementation. The analysis models the error as an additive white noise and arrives at the signal to noise ratio (SNR) successively. Then, a simplified method for 2-D sliding DFT based on vector radix (VR) algorithm is introduced. With this approach the fixed-point error can be reduced to the same scale as that of 2-D FFT. As an example, the architecture and error analysis of 8*8 2-D sliding DFT array processor based on VR-4*4 algorithm are presented. The idea can be extended to larger size DFT. Finally some comparisons are derived.  1999 Elsevier Science B.V. All rights reserved. Zusammenfassung Gleitende, zweidimensional (2-D), diskrete Fourier-Transformationsalgorithmen sind in der Lage eine gleitende Spektralanalyse durchzufu¨hren und eine Echtzeitverarbeitung zu erzielen. In dieser Arbeit wird eine Festpunkt-Fehleranalyse durchgefu¨hrt, um eine theoretische Basis zur Hardwareimplementierung dieses Algorithmus zu entwickeln. Das Verfahren modelliert den Fehler als additives weisses Rauschen und erreicht sukzessive das Signalzu-Rauschverha¨ltnis (SNR). Anschlie{end wird ein vereinfachtes Modell fu¨r die 2-D-gleitende DFT vorgestellt, welche auf dem Vektor Radix (VR) Algorithmus basiert. Dadurch kann die Festpunkt Fehleranalyse zur gleichen Gro¨{enordnung wie bei der 2-D FFT reduziert werden. Als Beispiel wird die Architektur und Fehleranalyse eines gleitenden 8*8 2-D DFT Array Processors basierend auf dem VR-4*4 Algorithmus vorgestellt. Die zugrundeliegende Idee kann zu DFTs gro¨{erer La¨nge erweitert werden. Abschlie{end werden einige vergleichende Untersuchungen durchgefu¨hrt.  1999 Elsevier Science B.V. All rights reserved. Re´sume´ Un algorithme de Transforme´e de Fourier Discre`te (TFD) glissante bi-dimensionnelle (2-D) peut re´aliser une analyse spectrale glissante et un traitement de signal en temps re´el. Dans cet article, son analyse de l’erreur en virgule fixe est mene´e pour former une base the´orique a` une imple´mentation mate´rielle. L’analyse mode´lise l’erreur comme un bruit

* Corresponding author. Tel.: #8621 6281 2831; e-mail: [email protected] 0165-1684/99/$ — see front matter  1999 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 5 - 1 6 8 4 ( 9 8 ) 0 0 1 9 3 - 5

192

Y. Zhu et al. / Signal Processing 73 (1999) 191–201

blanc additif et en arrive avec succe`s au rapport signal sur bruit. Ensuite, nous introduisons une me´thode simplifie´e de TFD glissante 2-D sur la base d’un algorithme de radicaux vectoriels (RV). Avec cette approche, l’erreur en virgule fixe peut eˆtre re´duite a` la meˆme e´chelle que celle d’une FFT 2-D. A titre d’exemple, nous pre´sentons l’architecture et l’analyse de l’erreur d’un processeur re´alisant une TFD 2-D 8*8 glissante sur la base d’un algorithme RV 4*4. Cette ide´e peut eˆtre e´tendue a` des TFD de tailles plus grandes. Finalement, quelques comparaisons sont de´rive´es.  1999 Elsevier Science B.V. All rights reserved. Keywords: Sliding DFT; Error analysis; Array processor

1. Introduction When digital processing algorithms are implemented on a digital computer or with special-purpose hardware, quantization errors will arise due to finite register length effects: input quantization, coefficient rounding, and arithmetic quantization. The characteristics of these errors must be known if an algorithm is to be realized with the minimum word lengths for acceptable accuracy. It is often useful to perform an approximate analysis by representing the effect as an additive error signal, which will be referred to as roundoff noise. A lot of research work has been done on quantization error of various algorithms with different implementations. Fixed-point and floating-point error analyses for FFT algorithms are well known [5,7,15—17]. The coefficient roundoff error in the FFT implementation with a logarithmic number system was presented in [12], the result indicated the system provided better SNR performance than that implemented with a fixed-point and floatingpoint number system. The error caused by the digit slicing implementation of on chip FFT was also presented [13]. Yutai Ma [19] proposes an error propogation model for the in-place decimation-intime version of the radix-2 FFT algorithm and derives an accurate error expression and error variance for the computation of FFT. Becker et al. [1] discuss the problem of determining the errors in the approximation of the FFT using DFT and derive precise relative error formulas for types of functions known as canonical-k as well as asymptotic error formulas for functions known as order-k. Most of them adopted the systematic introduced statistical model [8] which is also used in this paper. Beraldin [2] presented the performance of one-dimensional

(1-D) sliding DFT. Here we show the analysis of 2-D DFT. Taking into account that the fixed-point implementation is more widely used, Section 2 shows the analysis of the fixed-point roundoff error and reaches SNR of 2-D sliding DFT. The result shows that despite its outstanding advantages and significance in signal processing, the limitation of this algorithm is its comparatively larger roundoff noise which makes the hardware implementation need longer register length than that of FFT for the same accuracy. Though with the rapid development of VLSI technology, the requirement is easy to be satisfied, we still want to improve its performance. It is by now well recognized that there is a strong interaction between the algorithm and its implementation [3]. In the past, many algorithms have been developed to reduce the computational cost of DFT. More recently, however, with the increasing emphasis on hardware-based systems, which very often operate at the highest possible speeds and achieve the maximum possible throughput, different techniques have been developed to reduce the computational complexity for particular systems [6,9,14]. These techniques control the quantization and/or coding of the fixed, intrinsic parameters of the system in a manner which simplifies the arithmetic operations and, in particular, multiplication. On the other hand, due to the current advanced technologies, very fast parallel multipliers are now available which can operate at speeds below 100 ns [10]. However, there is a penalty which is especially relevant to hardware based systems. This is an increase in chip area occupied by the multiplier and consequently an increase in the chip cost. Furthermore, the technology required in some applications which require high speed is quite advanced. So

Y. Zhu et al. / Signal Processing 73 (1999) 191—201

the problems arising from multiplications are still present, if in a slightly different form. Therefore, a clear-cut need exists for techniques which offer increased processing power without being too dependent on technology, or alternatively, techniques which ease the burden on technology for high performance systems [18]. In Section 3, a simplified 2-D sliding DFT transform based on VR algorithm [11] is derived. The aim is to reduce the effect of finite register length and make it more easily implemented by VLSI device. The VR method, being a natural extension of Cooley—Tukey FFT, was conceived by Rivard [11], further developed by Harris et al. [4]. This method is generally more efficient in terms of complex multiplications and additions than its 1-D case. Here we combine the 2-D sliding DFT with VR to obtain an improved 2-D sliding DFT algorithm. As an example of implementing the simplified algorithm in VLSI architecture, we design an 8*8 2-D sliding DFT array processor. Its basic unit is 4*4 PE which has no multiplier and consequently reduce the multiplication complexity to the maximum extent. Finally, fixed-point roundoff error analysis of this array processor and some comparisons are presented.

2. Error analysis of 2-D sliding DFT 2.1. 2-D sliding DF¹ algorithm 1-D sliding spectrum analysis based on Goertzel algorithm gets the output while the sampling signal is being inputted. In the light of similar idea, 2-D sliding DFT was realized in [20]. The definition of 2-D DFT is

We define the sequence y (n ,n ) as follows: II   y (n ,n ) II   ,\ L\ " x(r ,r )¼IP\L\>,P\, ¼IP\L\   , , P P L # x(r ,n )¼IP\L\¼\I,   , , P 0)n )N !1, 0)n )N !1. (2.2)     Comparing Eqs. (2.1) and (2.2), it follows that "X(k ,k ). y (n ,n )"   II   L,\ L,\ Defining

(2.3)

yy (n ,n )"y (n ,n )!y (n ,n !N ), (2.4) II   II   II    X(k ,k ) can be obtained by the following recursive   formulas [20]: yy (n #1,n )  II  "yy (n ,n )¼\I II   , #[x(n #1,n )!x(n #1,n !N )]¼\I¼\I,      , , (2.5) yy (0,n #1) II  "[yy (N !1,n )#x(0,n #1)   II  !x(0,n #1!N )]¼\I¼\I, (2.6)   , , "X (k ,k ). (2.7) yy (n ,n )" G>,   II   L,\ LG>, The unit structure in array processor based on Eqs. (2.5), (2.6) and (2.7) is shown in Fig. 1. In the shift register in Fig. 1, when the current value sent into the multiplier is 1, Eq. (2.5) is available. When the value is ¼\I, Eq. (2.6) is used. , 2.2. Equivalent unit structure

,\ ,\ X(k ,k )" x(n ,n )¼IL¼IL,     , , L L for k "0,1,2,N !1   and k "0,1,2,N !1,  

193

(2.1)

where N and N are the length of column and row   of the 2-D sequence x(n ,n ).  

The input and output of the DFT transform are generally complex. But in practical use real computation is more widely used. In this case, Fig. 1 should be transformed into its equivalent structure which fits for real computation. Define x(n ,n )"x(n ,n )!x(n ,n !N ).       

(2.8)

194

Y. Zhu et al. / Signal Processing 73 (1999) 191–201

Fig. 1. The structure of 2-D sliding DFT algorithm.

Then Eq. (2.5) can be written as y(n ,n )"y(n !1,n )¼\I     , #x(n ,n )¼\I¼\I. (2.9)   , , Note that here yy in Eq. (2.5) is represented by y for convenience. Define ¼\I"e pI,"cos a#j sin a, , 2pk , a" (2.10) N  ¼\I"e pI,"cos b#j sin b, , 2pk , (2.11) b" N  ¼\I¼\I"e pI,>I,"cos h#j sin h, , , k k h"2p  #  . (2.12) N N   After separating Eq. (2.9) into its real and imaginary parts, we obtain the following equations which are equivalent to Eq. (2.5):





Re[y(n ,n )]   "Re[y(n !1,n )]cos a   !Im[y(n !1,n )]sin a   #Re[x(n ,n )]cos h   !Im[x(n ,n )]sin h,  

(2.13)

Im[y(n ,n )]   "Re[y(n !1,n )]sin a   #Im[y(n !1,n )]cos a   #Re[x(n ,n )]sin h   #Im[x(n ,n )]cos h. (2.14)   For the case of Eq. (2.6), we can still use Eqs. (2.13) and (2.14) as its equivalent equation by changing all a into h.

2.3. Noise analysis Based on Eqs. (2.13) and (2.14), we get the equivalent unit strucutre shown in Fig. 2, where e —e   are noise sources. On account of the finite register length effect the computation errors of 2-D sliding DFT are mainly due to the accumulation of roundoff error in every multiplication. Here the effect of rounding or truncation is represented as an additive noise signal. They are uncorrelated to each other and with the input signals. In the hardware implementation, fixed-point arithmetic which is mainly discussed here is much widely used than floating-point case. For fixed-point arithmetic, roundoff errors occur only when multiplications are performed. Its additions are free of errors provided that no overflows occur. Without loss of generality, we consider fixedpoint numbers to be represented as (b#1)-bit

Y. Zhu et al. / Signal Processing 73 (1999) 191—201

195

binary fractions, with the binary point just to the right of the highest order bit. We will also assume that the roundoff error in multiplying two fixedpoint b-bit numbers has a uniform probability density function in the interval (!2\@, 2\@), with   variance of p"2\@/12. Furthermore, the roun doff errors are assumed to be uncorrelated to each other and with the input. Based on these assumptions, we model roundoff noise by inserting additive-independent signal into the flow graph and analyze the effects of the noise sources on the output. We can achieve the output SNR given by

Re[x(n ,n )] and Im[x(n ,n )] to the output     Re[y(n ,n )] and Im[y(n ,n )], respectively. They     are shown as follows:

p SNR" W , p D

Re[h (n)]"Re[h (n)]"Re[h (n)]    "Re[h (n)]"cos na,  Im[h (n)]"Im[h (n)]"Im[h (n)]    "Im[h (n)]"sin na,  Re[h (n)]"Re[h (n)]"Re[h (n)]    "Re[h (n)]"sin na, 

(2.15)

where p is the output signal variance and p is the W D output error signal variance. Based on the equivalent structure shown in Fig. 2 and the independence of the input signal and also the noise sources, we get the unit impulse responses h (n) and h (n) from the input signals VYP VYG

Re[h (n)]"Im[h (n)]"cos(na#h), (2.16) VYP VYG Im[h (n)]"sin(na#h), (2.17) VYP Re[h (n)]"!sin(na#h). (2.18) VYG The unit impulse responses h (n)—h (n) from the   noise sources e —e towards the same output are   the same as those in [2]:

Fig. 2. The equivalent unit structure of 2-D sliding DFT algorithm.

(2.19)

(2.20)

(2.21)

196

Y. Zhu et al. / Signal Processing 73 (1999) 191–201

Im[h (n)]"Im[h (n)]"Im[h (n)]    "Im[h (n)]"cos na. (2.22)  Since the noise generated in each multiplier is assumed to be independent, the output error signal variance is given by

3. Simplified 2-D sliding DFT algorithm based on VR algorithm

,\ p"p +Re[h (n)]#Re[h (n)]#2 D    L #Re[h (n)],"4Np, N"N N , (2.23)     where p"2\@/12 with products rounded to  b bits which is the finite register length [8]. Then we have

In this section we present a simplified method for 2-D sliding DFT based on VR algorithm. With this approach we can reduce the finite register length effect to the same scale as that of 2-D FFT. 4*4 2-D sliding DFT processing unit is chosen as the basic unit for the simplified architecture. And the size of data is required to be 4T*4T, where v is a positive integer number. Here we discuss the design of 8*8 2-D DFT array processor as an example. The architecture of larger size DFT can be derived similarly.

2\@N p" . D 3

3.1. Unit structure of 4*4 2-D sliding DF¹

(2.24)

As for the input signal, the probability is uniform in (!1/N,1/N) with zero-mean and variance of 1/(3N) [8]: 1 . p"E["x(n)"]" V 3N

(2.25)

It can be easily shown that the magnitude of the output is less than 1 provided that the magnitude of all the input points are less than 1/N. With this fact no overflow can occur internally either. Thus, the output signal variance is given by ,\ p"p +Re[h (n)]#Re[h (n)], W V VYP VYG L 1 "Np" . V 3N

(2.26)

Finally the SNR is obtained p 2\@ SNR" W " . (2.27) p N D Every unit in the parallel structure has the same SNR as shown above. The total number of units is N. For N"N *N the SNR declines rapidly when   N increases. Compared with FFT whose SNR is 2\@/N, the 2-D sliding DFT has a lower SNR. Longer registers are needed in 2-D sliding DFT realization than required for an FFT-based processor. With the rapid development of VLSI device, 32-bit microprocessors are widely used nowadays, the requirement is easy to be satisfied.

For the convenience of VLSI realization and a more satisfactory effect of finite length, 4*4 2-D sliding DFT unit is adopted in designing 2-D array processor. For 2-D data whose size is 4*4, where N "N "4, we have   ¼\I"e j2pk/N"e jkp/2"cos a#j sin a, , p a" k , (3.1) 2  ¼\I¼\I"¼\I\I"e pI>I , , , "cos h#j sin h, p h" (k #k ), k ,k "0,1,2,3. (3.2)    2  Therefore the value of sin a, cos a, sin h, cos h can only be 0,1 or !1 as shown in Table 1. Considering this characteristic, no multipliers are needed in the unit. As we know, with only adders and latches in a basic unit the VLSI implementation will become much easier. Based on Table 1 and the equivalent unit shown in Fig. 2, we can get all 4*4 basic units in Fig. 3. The procedure of computing a X(k ,k ) is: when   the value in the shift register in Fig. 1 is 1, we use the 4*4 unit structure (k ,k ) directly. When the   value is ¼\I, we should change all a into h in , Fig. 2 and consequently use another correspondent 4*4 unit. The change can be controlled by a control logic circuit. In this way a 4*4 2-D sliding DFT PE is realized.

Y. Zhu et al. / Signal Processing 73 (1999) 191—201

197

Table 1 All coefficients in 4*4 processing unit Number

k 

k 

k #k  

sin a

cos a

sin h

cos h

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3

0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3

0 1 2 3 1 2 3 4 2 3 4 5 3 4 5 6

0 1 0 !1 0 1 0 !1 0 1 0 !1 0 1 0 !1

1 0 !1 0 1 0 !1 0 1 0 !1 0 1 0 !1 0

0 1 0 !1 1 0 !1 0 0 !1 0 1 !1 0 1 0

1 0 !1 0 0 !1 0 1 !1 0 1 0 0 1 0 !1

3.2. 8*8 2-D sliding DF¹ array processor based on »R algorithm VR algorithm is the direct and effective solution to the computation of multi-dimensional data [3]. It is shown as follows, where N and N are both   even numbers: ,\ ,\ X(k ,k )" x(i, j)¼IG¼IH   , , G H ,\ ,\ x(2i,2j)¼IG ¼IH " , , G H ,\ ,\ #¼I x(2i#1,2j)¼IG ¼IH , , , G H ,\ ,\ #¼I x(2i,2j#1)¼IG ¼IH , , , G H ,\ ,\ #¼I ¼I x(2i#1,2j#1) , , G H (3.3) ;¼IG ¼IH , , for k "0,1,2,N !1 and k "0,1,2,N !1.     When N "N "8, VR algorithm is represent  ed as   X(k ,k )" x(i, j)¼IG¼IH     G H

  " x(2i,2j)¼IG>IH  G H   #¼I x(2i#1,2j)¼IG>IH   G H   #¼I x(2i,2j#1)¼IG>IH   G H   #¼I>I x(2i#1,2j#1)¼IG>IH   G H (3.4) for k "0,1,2,7 and k "0,1,2,7.   With the VR algorithm above we can design the 8*8 2-D sliding DFT array processor based on 4*4 PE. According to VR algorithm the 8*8 data are derived into four blocks, each size is 4*4, shown as follows: (0,0) (0,2) (0,4) (0,6) data block 1:

data block 2:

 

(2,0)

(2,2)

(2,4)

(2,6)

(4,0)

(4,2)

(4,4)

(4,6)

(6,0)

(6,2)

(6,4)

(6,6)

(1,0)

(1,2)

(1,4)

(1,6)

(3,0)

(3,2)

(3,4)

(3,6)

(5,0)

(5,2)

(5,4)

(5,6)

(7,0)

(7,2)

(7,4)

(7,6)

 

,

,

198

Y. Zhu et al. / Signal Processing 73 (1999) 191–201

data block 3:

data block 4:

 

 

(0,1)

(0,3)

(0,5)

(0,7)

(2,1)

(2,3)

(2,5)

(2,7)

(4,1)

(4,3)

(4,5)

(4,7)

(6,1)

(6,3)

(6,5)

(6,7)

(1,1)

(1,3)

(1,5)

(1,7)

(3,1)

(3,3)

(3,5)

(3,7)

(5,1)

(5,3)

(5,5)

(5,7)

(7,1)

(7,3)

(7,5)

(7,7)

,

.

The 8*8 array processor is given in Fig. 4. In the figure, the four 4*4 PE are the same. Here every complex multiplication consists of four real multiplications. Assume that x, ¼ and y are all complex numbers expressed as x"Re[x]#j Im[x],

(3.5)

¼"Re[¼]#j Im[¼],

(3.6)

y"Re[y]#j Im[y]

(3.7)

and y"x¼. Then y"x¼"(Re[x]#j Im[x])(Re[¼]#j Im[¼]) "(Re[x]Re[¼]!Im[x]Im[¼]) #j(Re[x]Im[¼]#Im[x]Re[¼]).

(3.8)

Re[y]"Re[x]Re[¼]!Im[x]Im[¼],

(3.9)

Im[y]"Re[x]Im[¼]#Im[x]Re[¼].

(3.10)

So

Fig. 3. All of the 4*4 unit structure of simplified 2-D sliding DFT.

Based on Eqs. (3.9) and (3.10), the hardware realization of a complex multiplication is shown in Fig. 5. From Fig. 4 we can see that 8*8 2-D sliding DFT processor has very simple structure and can do computation while sampling. Consequently, it can accomplish real-time processing, high regularity, simple data communication schemes and simple control properties, which are all suitable for VLSI systolic array implementation.

Y. Zhu et al. / Signal Processing 73 (1999) 191—201

199

Fig. 4. A processing element of 8*8 array processor of simplified 2-D sliding DFT.

Then the output signal variance p is W 1 p"Np" . W V 3N

(3.14)

Here N"8*8"64. The SNR now can be expressed as p 2\@ 2\@ 2\@\ SNR" W " " " . 3N 3;64 3 p D

(3.15)

Fig. 5. Schematic figure of a complex multiplication.

4. Comparisons 3.3. Error analysis Here we perform an approximate error analysis for this simplified processor. We have 2\@ p" ,  12

(3.11)

where p is the variance of a noise source. Noting  that only three complex multiplications are needed in the improved 8*8 processor and each corresponds to four real multiplications, the output error signal variance p is given by D p"3 ; 4p"12p"2\ D  

4.1. Comparison of the finite register length effect (1) For simplified 2-D sliding DFT, 2\@ SNR " " 3N



(4.1)



(4.2)

2\@\ " . 3 , (2) For 2-D sliding DFT,

(3.12)

2\@ SNR " " N

(3.13)

2\@ SNR " " 4N

and 1 p" . V 3N

Here we compare the effect of finite register length and the number of multiplications among simplified 2-D sliding DFT based on VR algorithm, 2-D sliding DFT and FFT with the data size of 8*8.

"2\@\. L (3) For 2-D FFT,



"2\@\. ,

(4.3)

200

Y. Zhu et al. / Signal Processing 73 (1999) 191–201

4.2. Comparison of the number of real multiplications (n) in computing an output of X(k ,k )   (1) For simplified 2-D sliding DFT, n"3*4"12.

(4.4)

Knowing that half of the factors ¼I, ¼I and   ¼I>I are 0, 1 or !1, the number of multiplica tions can be reduced to n/2"6. (2) For 2-D sliding DFT, n"8*64"512.

(4.5)

(3) For 2-D FFT. We know that the number of complex multiplications it needs to compute an output of X(k ,k ) is log N #log N "       log N N [8], then for N"8*8, we have    n"4*log (8*8)"24. 

(4.6)

Similar to case 1, here n can also be reduced to n/2"12. The above comparisons indicate that among the three methods, the simplified 2-D sliding DFT based on VR algorithm has the least multiplication complexity and the highest SNR. So it is suitable for VLSI implementation.

5. Conclusion Analysis of the effect of finite register length for 2-D sliding DFT has been carried out. Statistical models are used to predict the output SNR. The result showed that the SNR is proportional to 1/N while N is the size of 2-D data. Compared with FFT whose SNR is 2\@/N, this algorithm has a lower SNR. However, the performance can be improved with the help of VR algorithm on condition that the size of data is 4T*4T (here v is a positive integer number). Furthermore, the 4*4 DFT processing unit is chosen in designing array processor. It contains only four adders, four latches and no multipliers. As an example, 8*8 array processor is designed and the idea can be easily extended to larger size. Finally the finite register effect of the simplified algorithm is introduced and compared with that of 2-D sliding DFT and 2-D FFT. Comparisons show

that among the three methods the simplified 2-D sliding DFT has the lowest roundoff noise. This characteristic has prominent significance in choosing suitable finite register length in VLSI implementation. Some research work is still needed in this project. When the data size N (here N"N *N ) increases,   the number of processing units rises rapidly, so the complexity of the circuit architecture will also increase. How to reduce the complexity and keep the satisfactory finite register length effect is to be further studied.

Acknowledgements This project is supported by National Natural Science Foundation of China.

References [1] R.I. Becker, N. Morrison, The errors in FFT estimation of the Fourier transform, IEEE Trans Signal Process. 44 (8) (August 1996) 2073—2077. [2] J.A. Beraldin, T. Aboulnasr, W. Steenaart, Efficient onedimension systolic array realization of the discrete Fourier transform, IEEE Trans. CAS 36 (1) (January 1989) 95—100. [3] P. Duhamel, M. Vetterli, Fast Fourier transforms: A tutorial review and a state of the art, Signal Processing 19 (4) (1990) 259—299. [4] D.B. Harris, J.H. McClellan, D.S.K. Chan, H.W. Schuessler, Vector radix fast Fourier transform, in: IEEE Internat. Conf. Acoust. Speech Signal Process., Hartford, Conn., 9—11 May 1977, pp. 548—551. [5] D.V. James, Quantization errors in the fast Fourier transform, IEEE Trans. Acoust. Speech Signal Process. ASSP23 (3) (June 1975) 277—283. [6] B. Liu, A. Peled, A new hardware realization of high speed fast Fourier transforms, IEEE Trans. Acoust. Speech Signal Process. ASSP-23 (6) (December 1975) 543—547. [7] A.V. Oppenheim, C.J. Weinstein, Effects of finite register length in digital filtering and fast Fourier transform, Proc. IEEE (Invited Paper) 60 (August 1972) 957—976. [8] A.V. Oppenheim, R.W. Schafer, Digital Signal Processing, Prentice-Hall, Englewood Cliffs, NJ, 1975, Chapters 6 and 9. [9] A. Peled, On the hardware implementation of digital signal processors, IEEE Trans. Acoust. Speech Signal Process. ASSP-24 (1) (February 1976) 76—86. [10] W.A. Perera, Architectures for multiplierless fast Fourier transform hardware implementation in VLSI, IEEE Trans. Acoust. Speech Signal Process. ASSP-35 (12) (December 1987) 1750—1760.

Y. Zhu et al. / Signal Processing 73 (1999) 191—201 [11] G.E. Rivard, Direct fast Fourier transform of bivariate functions, Annual Meeting of the Optical Society of America, Boston, MA, October 1975. [12] D.V. Satish Chandra, Accumulation of coefficient roundoff error in fast Fourier transforms implemented with logarithmic number system, IEEE Trans. Acoust. Speech Signal Process. ASSP-35 (11) (Novermber 1987) 1633—1636. [13] Z.A.M. Sharrif, M. Othman, T.S. Theong, Noise analysis for digit slicing FFT, IEE Proc. 138 (5) (October 1991) 509—512. [14] D.J. Spreadbury, T.M. Rees-Roberts, VLSI gate array prime radix Fourier transform processor, in: Proc. IEEE Internat. Conf. Acoust. Speech Signal Process., Tampa, FL, 26—29 March 1985, Vol. 4, pp. 1473—1476. [15] M. Sundaramurthy, V.U. Reddy, Some results in fixedpoint fast Fourier transform error analysis, IEEE Trans. Computer C-26 (3) (March 1977) 305—308.

201

[16] C.J. Weinstein, Roundoff noise in floating point fast Fourier transform computations, IEEE Trans. Audio Electroacoust. AU-17 (September 1969) 209—215. [17] P.D. Welch, A fixed point fast Fourier transform error analysis, IEEE Trans. Audio Electroacoust. AU-17 (June 1969) 151—157. [18] H.R. Wu, F.J. Paoloni, The structure of vector radix fast Fourier transforms, IEEE Trans. Acoust. Speech Signal Process. ASSP-37 (9) (September 1989) 1415—1424. [19] Yutai Ma, An accurate error analysis model for fast Fourier transform, IEEE Trans. Signal Process. 45 (6) (June 1997) 1641—1645. [20] Y.S. Zhu, M. Yang, 2-D sliding spectrum analysis, IEEE Internat. Symp. on CAS, Shen Zhen, China, 20—22 May 1992, pp. 1448—1450.