Signal Processing 65 (1998) 383—390
Fast generalized DFT and DHT algorithms Guoan Bi*, Yanqiu Chen School of Electrical and Electronic Engineering, Block S1, Nanyang Technological University, Nanyang Avenue, Singapore 639798, Singapore Received 3 March 1997; revised 31 July 1997
Abstract This paper presents fast algorithms for complex and real generalized discrete Fourier transform (GDFT). It shows that the length-N GDFT can be computed by a split-radix algorithm of discrete Fourier transform (DFT) whose input and output sequences are rotated by twiddle factors. It also reveals that the odd-time, odd-frequency and odd-squared DFTs are closely related to the discrete cosine transform (DCT). Compared with recently reported split-radix GDFT algorithm, the proposed one provides a simpler computational structure and significant savings on the number of arithmetic operations. ( 1998 Elsevier Science B.V. All rights reserved. Zusammenfassung Die Arbeit stellt schnelle Algorithmen vor fu¨r die komplexe und die reelle verallgemeinerte diskrete Fouriertransformation (GDFT). Sie zeigt, da{ die GDFT der La¨nge N mit Hilfe eines Splitradix-Algorithmus der diskreten Fouriertransformation (DFT) berechnet werden kann, dessen Eingangs- und Ausgangsfolgen durch Twiddle-Faktoren gedreht wurden. Es zeigt ferner, da{ die DFTs ungerader La¨nge im Zeit- und Frequenzbereich und der La¨nge eines ungeraden Quadrates eng mit der diskreten cosinustransformation (DCT) zusammenha¨ngen. Im Vergleich mit dem ku¨rzlich vero¨ffentlichten Splitradix-GDFT-Algorithums hat der vorgeschlagene Algorithmus eine einfachere Berechnungsstruktur und erhebliche Einsparungen bei der Anzahl der aritmetischen Operationen. ( 1998 Elsevier Science B.V. All rights reserved. Re´sume´ Cet article pre´sente des algorithmes rapides pour la transformation de Fourier discre`te ge´ne´ralise´e (GDFT) complexe et re´elle. Il est montre´ que la GDFT de longueur N peut eˆtre calcule´e par un algorithme de transformation de Fourier discre`te (DFT) de type split-radix dont les se´quences d’entre´e et de sortie subissent une rotation. Ceci indique e´galement que les DFT temps-impair, fre´quence-impaire et carre´-impair sont lie´es e´troitement a` la transformation discre`te en cosinus (DCT). Si on le compare a` un algorithme de GDFT de type split-radix re´cemment de´crit, l’algorithme propose´ offre une structure de calcul plus simple et des e´conomies significatives sur les ope´rations arithme´tiques. ( 1998 Elsevier Science B.V. All rights reserved. Keywords: Fast Fourier transform; Discrete cosine transform; Discrete Hartley transform
* Corresponding author. Tel.: #65 799 4823; fax: #65 791 2687; e-mail:
[email protected]. 0165-1684/98/$19.00 ( 1998 Elsevier Science B.V. All rights reserved. PII S 0 1 6 5 - 1 6 8 4 ( 9 7 ) 0 0 2 3 4 - X
384
G. Bi, Y. Chen / Signal Processing 65 (1998) 383–390
1. Introduction The generalized discrete Fourier transform (GDFT) and discrete Hartley transform (GDHT) have many applications in digital signal processing such as filter banks, convolution and signal representation [4,5,7,9]. Fast algorithms ([6,7,9], for example) were reported to reduce the required computational complexity, i.e. principally the number of multiplications and additions. In particular, a split-radix algorithm [9] decomposes the entire computation of length-N GDFT into the computation of a length-N/2 and two length-N/4 GDFTs. Based on this algorithm, efficient computation of the odd-time and odd-frequency DFT can be achieved. However, such a method requires too many redundant operations for arbitrary time and frequency origins, i.e. n and k , and the 0 0 odd-squared DFT. Fast algorithms for one- and two-dimensional odd-time, odd-frequency and odd-squared DHT are reported in [6,7]. It can be shown that the computational complexity needed by the GDHT depends upon that of the DHT. A low count of arithmetic operations can be achieved if the DHT is obtained from DFT computation [3]. One drawback of such an arrangement is to use complex arithmetic operations, which increases implementation complexity. It is known that both real-valued DFT and DHT require about the same number of multiplications and additions [2,8]. If the DHT is computed from the real-valued DFT, each DHT transform output needs one extra addition to combine the real and imaginary parts of the real-valued DFT. However, comparisons show that the GDFT algorithm [9] for n "k "0.5 needs many more arithmetic 0 0 operations than that needed by its GDHT counterpart [7]. Therefore, it is evident that an improvement can be made on the GDFT algorithm. This paper attempts to make improvement on the GDFT algorithm reported in [9] and to reveal a relationship between the GDFT and DCT, similar to that between the DFT and DCT [10]. The next section presents fast algorithms using split-radix FFT algorithm to compute GDFT for arbitrary n and k . DCT based algorithms for the 0 0 odd-squared, odd-time and old-frequency DFTs are also reported. The computational complexity
required by the proposed algorithms for complex and real data is also reported. The computation of GDHT by modifying the GDFT algorithms and the required computational complexity are considered in Section 3. Finally, comparisons among various algorithms in terms of the required number of arithmetic operations and computational structures are made in Section 4.
2. Algorithm The GDFT of a sequence x(n), 0)n)N!1, is defined by N~1 X(k)" + x(n)¼(n`n0)(k`k0) N n/0 N~1 "¼n0(k`k0) + x(n)¼nk0¼nk, N N N n/0 0)k)N!1,
(1)
where ¼ "exp(!j2p/N), N is the sequence N length, and n and k are the time and frequency 0 0 origins, respectively. When n "k "1/2, the 0 0 GDFT is often known as the odd-squared DFT. When n "1/2 and k "0 (or n "0 and k "0 0 0 0 0 1/2), the GDFT is often termed as the odd-time (or odd-frequency) DFT.
2.1. Generalized DFT (arbitrary n and k ) 0 0 The GDFT defined by (1) is basically a length-N DFT whose input sequence x(n) and output sequence X(k) are rotated by twiddle factors. The even-indexed GDFT outputs can be computed by N~1 X(2k)"¼n0k + x(n)¼nk0`n0k0¼nk N@2 N@2 N n/0 N@2~1 "¼n0k + [x(n)#x(n#N/2) N@2 n/0 ]e~+pk0]¼nk0`n0k0¼nk , N@2 N 0)k)N/2!1,
(2)
which is a length-N/2 DFT whose input and output sequences are rotated. The odd-indexed outputs
G. Bi, Y. Chen / Signal Processing 65 (1998) 383—390
can be computed by
2.2. Old-squared DF¹ (n "k "0.5) 0 0
N~1 X(4k#1)"¼n0k + x(n)¼nk0¼n(4k`1) N@4 N N n/0
When n " k "0.5, Eq. (1) becomes 0 0 X(k)"A(k)!jB(k), k"0,2, N/2!1,
N@4~1 "¼n0(4k`k0`1) + M[x(n)!x(n#N/2)e~+nk0] N n/0 !j[x(n#N/4)!x(n#3N/4)e~+nk0]e~+nk0@2N ]¼n(k0`1)¼nk N@4 N
(3)
and X(4k#3) N@4~1 "¼n0(4k`k0`3 + M[x(n)!x(n#N/2)e~+nk0] N@4 n/0 #j[x(n#N/4)!x(n#3N/4)e~+nk0] ]e~+nk0@2N¼n(k0`3)¼nk , N@4 N
(4)
where for Eqs. (3) and (4), k"0, 2, N/4!1. In summary, the length-N GDFT for arbitrary n and 0 k can be decomposed into one length-N/2 and two 0 length-N/4 DFTs whose input and output sequences are rotated by twiddle factors. In contrast, Pei’s split-radix GDFT algorithm [9] decomposes the length-N GDFT into one length-N/2 and two length-N/4 GDFTs. The two algorithms have no fundamental difference since the proposed one can be easily converted into Pei’s algorithm by manipulating twiddle factors in Eqs. (2)—(4). However, our algorithm allows trivial rotation factors to be exploited to significantly reduce the required number of arithmetic operations. For computation of the odd-time, odd-frequency and odd-squared DFTs, it may be necessary to combine some of twiddle factors in Eqs. (2)—(4) to minimize the overall computational complexity. Pei reported the use of split-radix GDFT algorithm to compute complexand real-valued odd-time, odd-frequency and oddsquared DFT, which will not be repeated here. The rest of this section presents DCT-based algorithms for the computation of the odd-time, odd-frequency and odd-squared DFT.
385
(5)
where N/1 p(2n#1)(2k#1) A(k)" + x(n) cos , 2N n/0 k"0, 2, N/2!1,
(6)
is the real part of Eq. (1), and N~1 p(2n#1)(2k#1) , B(k)" + x(n) sin 2N n/0 k"0, 2N/2!1, or
A
B
N !1!k 2
(7a)
B
N~1 p(2n#1)(2k#1) " + x(n)(!1)n cos , 2N n/0 k"0,2N!1,
(7b)
is the imaginary part of Eq. (1). For k"0, 2, N/2!1, A(k)"!A(N!1!k) and B(k)" B(N!1!k). Therefore, Eq. (5) becomes X(N!k!1)"![A(k)#jB(k)], k"0, 2, N/2!1.
(8)
We further define an auxiliary sequence A(k)#A(k!1) F(k)" 2 N~1 p(2n#1) p(2n#1)k " + x(n) cos cos , 2N 2(N/2) n/0 N@2~1 p(2n#1) " + [x(n)!x(N!1!n)]cos 2N n/0 p(2n#1)k ]cos , k"0, 2, N/2!1, (9) 2(N/2) which is a length-N/2 DCT of complex data. The sequence A(k) can be achieved by A(k)"2F(k)!A(k!1), k"1, 2, N/2!1, A(0)"F(0).
(10)
386
G. Bi, Y. Chen / Signal Processing 65 (1998) 383–390
In the same way, Eq. (7b) can be obtained by a length-N/2 DCT computation. Therefore, the computation of the length-N odd-squared DFT can be achieved by the following steps: 1. compute A(k) and B(k) according to Eqs. (9) and (10). 2. compute the final transform outputs by Eqs. (5) and (8).
which is a length-N/2 odd-frequency DFT, and N@2~1 D(k)" + x(2n#1)¼(n`1@2)(k`1@2), n@2 n/0 0)k)N/2!1,
(15)
which is a length-N/2 odd-squared DFT. Similarly, Eq. (14) can be further decomposed and Eq. (15) is computed by the algorithm proposed in Section 2.2.
2.3. The odd-time DF¹ (n "0.5, k "0) 0 0 When n "0.5 and k "0, Eq. (1) can be decom0 0 posed into even- and odd- indexed transform outputs. For example, N@2~1 X(2k)" + [x(n)#x(n#N/2)]¼(n`1@2)k, N@2 n/0 0)k)N/2!1,
(11)
which is a length-N/2 odd-time DFT, and N@1 X(2k#1)" + x(n)¼(n`1@2)(k`1@2) N@2 n/0 N@2~1 " + [x(n)!x(n#N/2)]¼(n`1@2)(k`1@2), N@2 n/0 0)k)N/2!1, (12) which is a length-N/2 odd-squared DFT which can be computed by the algorithm shown in Section 2.2. The length-N/2 odd-time DFT can be further decomposed in the same way.
2.4. The odd-frequency DF¹ (n "0, k "0.5) 0 0 When n "0, and k "0.5, Eq. (1) can be de0 0 composed into X(k)"C(k)#D(k),
The algorithm for real-valued odd-squared GDFT can be easily derived from Eqs. (5)—(10). A(k) in Eq. (6) and B(k) in Eq. (7) directly deal with real data and become the real and the imaginary parts of the final transform outputs, respectively. The additions and subtractions in Eqs. (5) and (8) can be eliminated. Eq. (9) becomes a length-N/2 DCT of real data. For the odd-time and oddfrequency DFTs, the computation for real data can also be easily obtained from their complex counterparts. In this case, the length-N/2 odd-time (or odd-frequency) DFT directly deals with real data without any modification.
2.6. Computational complexity If a complex multiplication is implemented by four real multiplications and two real additions, the number of real multiplications required by the proposed split-radix GDFT algorithm is M (N)"M (N/2)#2M (N/4) GDFT DFT DFT #11N!12,
(16)
and the number of real additions is
0)k)N/2!1,
X(k#N/2)"C(k)!D(k), 0)k)N/2!1, (13) where N@2~1 C(k)" + x(2n)¼n(k`1@2), n@2 n/0
2.5. Real-valued GDF¹
0)k)N/2!1, (14)
A (N)"A (N/2)#2A (N/4)#8.5N!6, GDFT DFT DFT (17) where, in both Eqs. (16) and (17), the subscript GDFT means the computation for GDFT with arbitrary n and k , and the subscript DFT indi0 0 cates the computation for DFT.
G. Bi, Y. Chen / Signal Processing 65 (1998) 383—390
For n "k "0.5, the number of multiplications 0 0 needed by a length-N odd-squared DFT of complex data is (N)"4M (N/2)#2N MOS CGDFT DCT "N(log N#1), N'2, 2 and the number of additions is
(18)
AOS (N)"N(3 log N#1), N'2, (19) CGDFT 2 where the superscript OS indicates the odd-squared and subscript CGDFT means the complex GDFT. The DCT computation can be performed by a fast DCT algorithm [1]. When N is a power of two, the number of multiplications needed by a length-N DCT is M (N)"N/2 log N, DCT 2 and the number of additions is
complex data. The number of additions needed by real-valued odd-time DFT is AOT (N)"AOT (N/2)#0.75N log N, RGDFT 2 RGDFT N'4.
(25)
The same additive complexity as that given in Eq. (25) is needed by the odd-frequency DFT. In deriving Eq. (25), we consider the property of realvalued odd-frequency DFT, X(N!1!k)"X*(k), k"0, 2, N/2!1,
(26)
and that of the odd-squared DFT, X(N!1!k)"!X*(k), k"0, 2, N/2!1, (27)
(20) 3. Algorithm of GDHT
A (N)"3N/2 log N!N#1. (21) DCT 2 For real data, the decomposition cost is halved and the additions/subtraction for Eq. (5) and (8) are redundant. Therefore, the number of multiplications for a length-N odd-squared DFT of real data is exactly the half of that for complex data, and the number of additions is (N)"0.5N(3 log N!1), N'2, (22) AOS RGDFT 2 where RGDFT indicates the real-valued GDFT. From Sections 2.3 and 2.4, it can be easily seen that the computational complexity of both old-time and odd-frequency DFTs is additions and one length-N/2 odd-time (or odd-frequency) DFT and one length-N/2 odd-squared DFT. Therefore, the computational complexity for both odd-time and odd-frequency DFTs is the same. For example, the number of multiplications needed by the complex odd-time DFT is MOT (N)"MOT (N/2)#0.5N log N, CGDFT 2 CGDFT N'4,
387
(23)
and the number of additions is
The generalized DHT of a real sequence x(n) can be defined by N~1 2p(n#n )(k#k ) 0 0, H(k)" + x(n) cos N n/0 0)k)N!1,
(28)
where cas( )"cos( ) #sin( ). When n "k "1/2, 0 0 the GDHT is often known as the odd-squared DHT. When n "1/2 and k "0 (or n "0 and 0 0 0 k "0, 1/2), the GDHT is often termed as the 0 odd-time (or odd-frequency) DHT. Fast algorithms for one and two-dimensional odd-time, odd-frequency and odd-squared DHT were reported in [6—8]. It is known that the computation of DHT can be achieved by real-valued DFT [3]. Similarly, the GDHT can also be obtained by a real-valued GDFT. For arbitrary values of n and k , it is 0 0 difficult to derive an algorithm, which requires a computational complexity similar to that of a real-valued GDFT. For odd-time, odd-frequency and odd-squared GDHT, however, such a computational complexity can be achieved.
(N)"AOT (N/2)#1.5N log N#N, AOT CGDFT 2 GGDFT N'4, (24)
3.1. Odd-squared DH¹
For real data, the number of multiplications for the odd-time or odd-frequency DFT is half of that for
The algorithm for the odd-squared DHT can be easily obtained by converting j in Eqs. (5) and (8)
388
G. Bi, Y. Chen / Signal Processing 65 (1998) 383–390
When n "0 and k "0.5 in Eq. (28), the odd0 0 frequency DHT can be decomposed into
into (!1) such as HOS(k)"A(k)#B(k),
HOF(k)"A(k)#B(k), 0)k)N/2!1,
HOS(N!k!1)"![A(k)!B(k)], k"0, 2, N/2!1,
(29)
where A(k) and B(k), defined in Eqs. (6) and (7), can be computed as shown in Eqs. (9) and (10). The computational complexity of the odd-squared DHT is also closely related to that of the real-valued oddsquared DFT. For example, both algorithms use the same number of multiplications. However, Eq. (29) needs N extra additions compared to the realvalued odd-squared DFT. Therefore, the number of additions needed by the odd-squared DHT is AOS (N)"0.5N(3 log N#1), N'2. GDHT 2
(30)
3.2. Odd-time and odd-frequency DH¹ Fast algorithms for odd-time and odd-frequency DHT can be similarly derived from their GDFT counterparts. When n "0.5 and k "0 in 0 0 Eq. (28), for example, the odd-time DHT can be decomposed into N~1 2p(n#1/2)k HOT(2k)" + x(N) cas N/2 n/0 N@2~1 " + [x(n)#x(n#N/2)] n/0 2p(n#1/2)k ]cas , N/2 0)k)N/2!1,
(31)
HOF(k#N/2)"A(k)!B(k), 0)k)N/2!1,
(33)
where N@2~1 2pn(k#1/2) , A(k)" + x(2n) cas N/2 n/0 0)k)N/2!1,
(34)
which is a length-N/2 odd-frequency DHT, and N@2~1 2p(n#1/2)(k#1/2) , B(k)" + x(2n#1) cas N/2 n/0 0)k)N/2!1, (35) which is a length-N/2 odd-squared DHT. Eq. (34) can be further decomposed in the same way. From Eqs. (34) and (35), the number of additions needed by the odd-time DHT is AOT (N)"AOT (N/2)#0.75N log N#0.5N, GDHT GDHT 2 N'2, (36) where AOT (2)"2. The number of additions GDHT for the odd-frequency DHT is the same as that required by the odd-time DHT. The multiplicative complexity of both odd-time and -frequency DHT is the same as that of the real-valued oddtime or -frequency DFT, i.e. half of that given in Eq. (23).
which is a length-N/2 odd-time DHT, and 4. Comparisons
HOT(2k#1) N~1 2p(n#1/2)(k#1/2) " + x(n) cas N/2 n/0 N@2~1 " + [x(n)!x(n#N/2)] n/0 2p(n#1/2)(k#1/2) , ]cas N/2 0)k)N/2!1, which is a length-N/2 odd-squared DHT.
(32)
In this section, the proposed and other reported algorithms are compared in terms of their computational complexity and computational structures. Pei’s split-radix GDFT algorithm decomposed the length-N GDHT into one length-N/2 and two length-N/4 GDHTs [9]. When n and k are arbit0 0 rary, the algorithm needs many more multiplications and additions than necessary because trivial twiddle factors are not exploited to reduce the number of arithmetic operations. Table 1 list the
G. Bi, Y. Chen / Signal Processing 65 (1998) 383—390
389
Table 1 Computational complexity for arbitrary n and k 0 0 DFT!
Pei’s algorithm
Proposed
N
Mul
Add
Mul
Add
Mul
Add
8 16 32 64 128 256
4 24 84 248 660 1656
52 144 372 912 2164 5008
128 336 816 1936 4464 10128
112 296 728 1736 4024 9160
76 168 372 824 1812 3960
86 214 514 1198 2738 6158
!Based on Duhamel’ algorithm using 2/4 scheme. Table 2 Computational complexity for odd-squared DFT Complex data Pei’s
Real data
Proposed
Pei’s
Proposed
N
Mul
Add
Mul
Add
Mul
Add
Mul
Add
4 8 16 32 64 128 256
18 54 146 366 882 2062 4722
34 102 274 686 1650 3854 8818
12 32 80 192 448 1024 2304
28 80 208 512 1216 2816 6400
9 27 73 183 441 1031 2361
11 35 97 247 601 1415 3257
6 16 40 96 224 512 1152
10 32 88 224 544 1280 2944
number of additions and multiplications required by both algorithms. For odd-squared DFT, the proposed algorithm based on DCT computation is much more efficient than Pei’s algorithm, as shown in Table 2. For the odd-time and odd-frequency DFT, however, Pei’s algorithm and the proposed DCT based one require the same number of additions and multiplications. Hu reported a fast algorithm which decomposes a length-N odd-time (or odd-frequency) GDHT into two length-N/2 DHTs, and a length-N oddsquared DHT is decomposed into two length-N/2 odd-frequency DHTs [7]. Because the DHT computation in Hu’s algorithm uses the real valued DFT [3], his algorithm needs one addition per point less than that needed by the proposed DCT-based algorithm. Both algorithms need the same number of multiplications for the oddsquared, odd-time and odd-frequency DHT.
Table 3 Computational complexity for GDHT Odd-squared DHT
Odd-time or freq. DHT
N
Mul
Add
Mul
Add
4 8 16 32 64 128 256
6 16 40 96 224 512 1152
14 40 104 256 608 1408 3200
2 8 24 64 160 384 896
10 32 88 224 544 1280 2944
Table 3 lists the number of additions and multiplications required by the DCT based algorithm for the odd-squared, odd-time and odd-frequency DHT.
390
G. Bi, Y. Chen / Signal Processing 65 (1998) 383–390
One further advantage of the DCT-based algorithm is that the sequence length N can be q*2/, where q is an odd integer, by using the DCT algorithm given in [1]. It provides a wider range of choices on sequence lengths and a further reduction of the number of arithmetic operations. When q"3, for example, the number of arithmetic operations for the odd-squared DFT is smaller than that for N"2n. For GDFT computation, the computational structure of the split-radix algorithm is generally more complicated than that of the DCT based algorithm. In particular, complex operations have to be used in the split-radix algorithm when the input data is real. The decomposition of Hu’s GDHT algorithm is not straightforward and the computation relies upon the computation of real valued DFT. It has the same drawback that complex operations are needed even for real data. This problem can be avoided if other fast DHT algorithms, which require more additions, are used. In contrast, the DCT based one uses simple decomposition procedures and requires only real operations in the entire computation, which enables an easy implementation.
5. Conclusion Fast algorithms for generalized DFT and DHT are presented. It reveals the relationship between the generalized DFT and DHT, and the DCT. Compared to Pei’s split-radix GDFT algorithm, the proposed algorithm achieves a significant reduction of the number of arithmetic operations for arbitrary n and k . The proposed DCT-based al0 0 gorithm also achieves savings on computational complexity for the odd-squared DFT. The DCT
based algorithm has a simpler computational structure than the split-radix GDFT algorithm [9] and the GDHT algorithm [7].
Acknowledgements The authors would like to thank the reviewers for their valuable comments on the previous manuscript.
References [1] G. Bi, L.W. Yu, DCT Algorithms for composite sequence lengths, IEEE Trans. Signal Process. 1998, to be published. [2] P. Duhamel, Implementation of ‘split-radix’ FFT algorithms for complex, real, and real-symmetric data, IEEE Trans. Acoust. Speech Signal Process. 34 (2) (1986) 285—295. [3] P. Duhamel, M. Vetterli, Improved Fourier and Hartley transform algorithm: Application to cyclic convolution of real data, IEEE Trans. Acoust. Speech Signal Process. 35 (6) (1987) 818—824. [4] O.K. Ersoy, Semisystolic array implementation of circular, skew-circular, and linear convolutions, IEEE Trans. Comput. 34 (2) (1985) 190—194. [5] G. Bonnerot, M. Bellanger, Odd-time odd-frequency discrete Fourier transform for symmetric real valued series, Proc. IEEE 64 (March 1976) 392—393. [6] N.-C. Hu, F.F. Liu, Fast computation of the two-dimensional generalized Hartley transforms, IEE Proc. Visual Image Signal Process. 142 (1) (1995) 35—39. [7] N.-C. Hu, H.-I. Chang, O.K. Ersoy, Generalized discrete hartley transforms, IEEE Trans. Signal Process. 40 (12) (December 1992) 2931—2940. [8] Malvar, Fast computation of the discrete cosine transform and the discrete Hartley transform, IEEE Trans. Acoust. Speech Signal Process. 35 (10) (1987) 1484—1485. [9] S.-C. Pei, T.-L. Luo, Split-radix generalized fast Fourier transform, Signal Processing 54 (2) (1996) 137—151. [10] M. Vetterli, H. Nussbaumer, Simple FFT and DCT algorithms with reduced number of operations, Signal Processing 6 (4) (1984) 267—278.