Computer Physics Communications 47 (1987) 113—127 North-Holland, Amsterdam
113
UNLDFF: A PACKAGE OF OPTIMIZED DISCRETE FOURIER TRANSFORMS G. CABRAS Istituto di Fisica, Università di Udine, Italy
V. ROBERTO Diparlimento di Matematica e Inforniatica, Università di Udine, Italy
and 0. SALEM! Istituto di Scienze della Terra, Università di Udine, Italy Received 5 April 1987
We present a package of FORTRAN 77 subprograms to compute the discrete Fourier transform of one-dimensional data sets. A large collection of specialized modules, assembled in a top-down scheme, guarantees high flexibility and efficiency. Comparative timings are given.
PROGRAM SUMMARY Title of program: UNIDFT
point DFT requires some 4 * NN extra bytes as compared to their complex counterparts
Catalogue number: AAXR Program obtainable from: CPC Program Library, Queen’s University of Belfast, N. Ireland (see application form in this issue) Computer: VAX 11-780; Installation: Centro Calcolo versità di Udine (Italy)
—
u~u-
Keywords: Fast Fourier Transform, Mixed-Radix FVF, Chirp—Zeta Transform Method of solution The discrete Fourier transform is computed via mixed-radix Fast Fourier Transform and Chirp—Zeta Transform algorithms, implemented in place, with radix-2, -3, -4, -5 and radix-8 decimation-in-frequency and decimation-in-time basic modules.
Operating system: VMS 4.4 Programming language: FORTRAN 77 High speed storage required: the code requires roughly 37 Kwords when all the subprograms are loaded; the complex NN-pomt DFT requires additional 40 * NN bytes when using FFrs and 64 * NN bytes when using CZTs; the real 2 * NN
Typical running times Top performances are reached for input data lengths which are powers of 8. The CPU-time required by a single-precision complex DFT of 4096 points is roughly 0.66 5; the real DFT on 4096 points requires 0.36 5; typical running times are up to 2 times as much when input data length does not contain powers of 8.
OO1O-4655/87/$03.50 © Elsevier Science Publishers B.V. (North-Holland Physics Publishing Division)
114
G. Cabras el al.
/
UNIDFT: optimized discreteFourier transforms
LONG WRITE-UP 1. Introduction
2. Basic DFT algorithms
The need of efficient, flexible and portable packages implementing the Discrete Fourier Transforms (DFT) is growing considerably in all domains of pure and applied physics. In this paper we present the design, implementation and long write-up of UNIDFT, a package of FORTRAN 77 subprograms to calculate the DFT of one-dimensional data sets. The package has been designed in a modular fashion, integrating in a top-down scheme all the specialized modules which perform the basic computations. No restrictions are imposed on the input data length NN (e.g., it is not necessary for NN to be an integer power of two), the only limitation being due to the storage capacity of the hardware support. The choice of appropriate arithmetic modules is directly accomplished by UN!DFT according to the value of NN, thus combining high flexibility with minimal computational complexity. As far as the implementation is concerned, two versions of the package one in single precision arithmetic, one in double have been implemented and tested on a DEC VAX/VMS 11-780. Each version contains specific solutions to achieve highest computing speed; however, no prescriptions relying explicitly on the operating system have been adopted, in order to ensure full portability. In particular, a number of solutions ensures a very efficient treatment of discrete convolutions, as requested in several physical applications. In the following, section 2 reports an outline of the basic algorithms employed by UNIDFT; section 3 introduces all the modules with some details; section 4 provides examples of use; section 5 contains performance evaluations, i.e. CPU times and error estimates, as well as a comparison with an already existing package; conclusions and an outline on future perspectives are reported in section 6.
UNIDFT implements standard algorithms to compute the direct and inverse DFT, which we write, respectively
—
—
NN -1
X(k)= ~
x(n)W~,
k=0,...,NN— 1,
n=0 .~
(1.1)
( n~
=
NN-
~
NN
x( k) w~,
k=
1 2 NN 1, where WNN exp(i2rr/NN). Two classes of algorithms have been used in the complex DFT’s. (a) Fast Fourier Transform (FFT) algorithms [2], for data sets whose length can be factorized in integer powers of 2, 3, 5. (b) Chirp—Zeta Transforms (CZT) algorithms, for NN any positive integer. Class (a) algorithms, in particular, belong to the so-called “Mixed-Radix FFT” [2,7]. According to the factor (i.e. the “radix”) decomposition of NN, the DFT operator itself is decomposed into a direct product of sub-matrices, each performing a sub-transform of order equal to a radix: so, the DFT is written as a combination of elementary transforms. UNIDFT embodies radix-2, -3, -4, -5 and radix-8 basic modules [2,3,6,7], each of which is implemented in a two-fold way: decimation-infrequency [5] to perform the direct DFT, and decimation-in-time [4] to perform the inverse one. All the modules involve in-place computations (i.e., they assign the same memory allocations to the input, intermediate and output data), thus optimizing memory occupation. The price to pay for this [2,3] is that a reordering phase is requested to restore natural ordering in the transformed data. However, it can be shown [2,3] that no reordering is needed when a direct DFT is followed by an inverse one, and the transforms are implemented via the complementary algorithms, one using deci=
0
—
=
G. Cabras et al.
/
UNIDFT: optimized discrete Fourier transforms
mation-in-time, the other one using decimationin-frequency. Therefore our choice of a two-fold implementation of the basic modules ensures a very efficient way of doing convolutions. Class (b) algorithms [3] implement the DFT itself as a convolution; actually the Fourier coefficients in eq. (1.1) can be expressed as
=
2/2
W~
~
~
NN
NN
n =0
n=0,...,NN—1,
(1.3)
(1.4)
No reordering is requested, but the number of operations is roughly a factor of six higher than in FFT’s, since the convolution requires three DFT’s on 2 * NN points at least. The effective length of convolutions is automatically chosen to be the integer M, greater or equal to 2* NN, for which the basic FFT modules assure the highest computing speed. A further class of algorithms, the so-called Winograd Fourier Transform (WFT) [8], has not been included in UNIDFT, since it has been shown [9] that it offers no advantages over highradix FFT’s when running on general-purpose computers. Transforms of real input sequences are also handled using two distinct techniques: One technique [2] merges two NN-point real sequences, say x ( n), y ( n), into an NN-point complex one ~= 0
NN
—
1.
Then the DFT coefficients of z(k), say Z(k) for k 0,.. NN 1 are combined to give the coefficients of x ( n), y( n), as follows: =
. ,
—
—
1,
Z * (NN
(1.5) —
k)
2i k = 0,..., NN/2 1.
(1.6)
—
z(n)=x(2n)+ix(2n+1),
. . .
x (n) + iy ( n),
—
n=0,...,NN—1.
Z(k) for k = 0,
,NN 1, are combined according to (1.5) and (1.6) to give G(k), H(k) for k=0,...,NN— 1. The output coefficients are given by
X(k)w2/2wT~7)2/2,
k=0,...,NN—1.
=
NN/2
z( k)
(1.7)
k=0
z ( n)
. . . ,
The coefficients of z(k), say
NN—1
W~2/’2 ~
—
2
k =0, Y( k) =
with NN any positive integer. The inverse CZT writes
x(n)=
z( k) + z * (NN k)
x( k) =
It is also possible [2] to transform a real 2 * NN-point sequence as a complex NN-point DFT; the latter is constructed as
NN—1
x(k)
115
—
X(k)=G(k)+W~.”NNH(k),
k
=
0,..
. ,
NN
—
1.
(1.8)
3. Architecture of UNIDFT The package has been designed according to a top-down scheme, which is graphically reported in figs. la, b (FFT subprograms) and fig. 2 (CZT subprograms). A three-level structure can be identified, each level grouping functionally homogeneous modules. High level: user-interface subroutines. Intermediate level: driving modules for arithmetic, reordering, pre- and post-processing phases. They are not accessible to the user. Low level: basic “data-handling” modules. All the numerical and sorting operations are accompushed at this level, which is also not accessible to the user. All the modules have been implemented both in single and in double precision arithmetic, the latter ones being identified by the “D” prefix. In the following we give an account of all the modules, adding more details when dealing with the user-interface ones.
116
G. Cabras et al.
/
UNIDFT: optimized discrete Fourier transforms > IL. U.
Ui
0 Iii
IU-
C,
0 I-
.0 r-
IL.
Ui
U.
0, .0
H
Li~ Li~
IU.
~ C)
IU.
~
E
C)
I~
•~
IL.
E
IL
U
H 0, 0,
I-. IJ
I— LL
LA.
Li.
()
I,-
LL
.0
c~.I
,..
E
o
0
I-.
I—
LI. U.
~‘
U.
0
I-.
~.
U.
U.
U.
0
0
Cl)
U)
_I
_I
Cl, Ui
o
t
CD
Ui
Ui
4 I—
4 I—
_J
—
4 I—
~
H 0,
G. Cabras et a!.
/
UNIDFT: optimized discrete Fourier transforms
>
117
C..1
z
IU. U.
C, —
0
Ui
Ui
I;
E
E
I-
I-
U.
U. U.
U. U.
0
0
0
I— N
I—
C.) 0
C.)
a
I—
N
N C.)
,0
c~J at
o I-
0 I-
0 I-
U.
U.
U.
0
0
0
E
m 4 IN
0
I.
4 IN
0
~
4 IN
0
118
/
G. Cabras et a!.
UNIDFT: optimized discrete Fourier transforms
3.1. User interface (D)CFFT Usage Arguments Input NN C WGT
: DFT of NN complex input data : CALL CFFT(NN, C, WGT, ISIG, lORD, INDEXI, INDEX2, IERR)
: input data length; it has to be a product of integer powers of 2, 3, 5 : (double) complex input data vector; to be dimensioned (0: NN 1) : (double) complex vector, used to host tables of exponentials for the DFT; to be dimensioned WGT(0 : NN 1) in the calling program : task flag: ISIG = 1 : direct DFT ISIG = 1 : inverse DFT : order flag: lORD = 1 : output of direct DFT requested in natural order; input to inverse DFT given in natural order lORD = 0 : output of direct DFT in permuted order; input to inverse DFT in permuted order : integer vector containing indices of C in permuted order; to be dimensioned INDEX1(0: 2* NN 1) in the calling program INDEX1(0 : NN 1) : used for reordering output from direct DFT INDEX1(NN : 2* NN 1) : used for reordering input to inverse DFT : integer vector used for temporary buffer; to be dimensioned INDEX2(0 : NN 1) —
—
ISIG lORD
INDEX1
—
—
—
—
INDEX2 Output C WGT INDEX1 INDEX2 IERR
—
: transformed vector; input data are overwritten : tables are filled and left unchanged; may be used for further transforms of the same length : filled with appropriate indices and left unchanged : undefined : error flag: IERR = 0: no detected errors IERR = 1: NN is not factorizable into allowed radices —
(D)RFFT Usage
: DFT of NN real input data, via a complex DFT of NN/2 data : CALL (D)RFFT (NN, C, WGT, RW, ISIG, lORD, INRE, INDEX1, INDEX2, IERR)
Arguments Input NN
: integer, it has to be an even number; NN/2 has to be a product of integer powers of 2,
3, 5 C WGT RW
: (double) real vector of input data, to be dimensioned C(0 : NN + 1) : (double) complex vector, to be dimensioned WGT(0 : NN/2) : (double) complex vector, to host phase factors in post-processing real transforms; to be dimensioned RW(0 : NN/4 1) : same use as in CFFT routine : as in CFFT routine : data flag: INRE = 1: real input sequence INRE = 0: real output sequence : same use as in CFFT; to be dimensioned INDEX1(0 : NN 1) : same use as in CFFT; to be dimensioned INDEX2(0 : NN/2 1) —
ISIG lORD INRE INDEX1 INDEX2
—
—
G. Cabras et at
/
UNIDFT: optimized discrete Fourier transforms
119
Output C WGT INDEX1 INDEX2 IERR
: : : : :
(D)R2FFT Usage
: DFT of two real input sequences of length NN : CALL R2FFT (NN, C, B, WGT, ISIG, lORD, INRE, INDEX1, INDEX2, IERR)
Arguments Input C B WGT ISIG, lORD INRE INDEX1 INDEX2
: : : : : : : :
input data length; not necessarily an even number, but same limitations as in CFFT (double) real vector with first input data sequence; to be dimensioned C(0: 2 * NN 1) (double) real vector with second input data sequence; to be dimensioned B (0: NN) same use as in CFFT; to be dimensioned WGT(0 : NN 1) see comments to CFFT same use as in RFFT routine same use as in CFFT; to be dimensioned INDEX1(0 :2 * NN 1) same use as in CFFT; to be dimensioned INDEX2(0 : NN 1)
Output C B WGT INDEX1 INDEX2 IERR
: : : : : :
first transformed (double) complex vector; output data are stored in C(0 : NN/2) second transformed (double) complex vector; output data are stored in B(0: NN/2) see comments to CFFT routine see comments to CFFT routine see comments to CFFT routine see comments to CFFT routine
(D)CCZT Usage
: DFT of NN complex input data via the CZT : CALL CCZT(NN, C, WGT, CH, CWNK, ISIG, IERR)
(double) complex vector containing output data same use as in CFFT same use as in CFFT same use as in CFFT error flag IERR = 0: no detected errors IERR = —1: NN is not factorizable into allowed radices IERR = —2: NN is odd, it must be even
—
—
—
—
Arguments Input
NN C WGT
: input data length; NN is any positive integer : (double) complex vector of input data, to be dimensioned C(0 :4 * NN 1) : (double) complex vector with tables of exponentials for the DFT; to be dimensioned WGT(0 :4 * NN 1) : (double) complex vector containing the sequence to convolve with C as in 1.3, 1.4; to be dimensioned CH(0 :4 * NN 1) : (double) complex vector containing the square weights for the CZT as in 1.3, 1.4; to be dimensioned CWNK (0:4 * NN 1) : see comments to CFFT routine —
—
CH
—
CWNK
—
ISIG Output C WGT
: transformed (double) complex vector; input data are overwritten : filled with appropriate factors and left unchanged
120
G. Cabras et at
/
UNIDFT: optimized discrete Fourier transforms
CWNK IERR
: filled with appropriate factors and left unchanged : error flag: IERR = 0: No detected errors IERR = 1: Warning: NN can be factorized into allowed radices
(D)RCZT Usage Arguments Input
: DFT of NN real input data via the CZT : CALL RCZT(NN, C, WGT, RW, CH, CWNK, ISIG, INRE, IERR)
NN
: Input data length; it has to be an even number : (double) real vector with input data set; to be dimensioned C(0 : 4 * NN + 1) : (double) complex vector with tables of exponentials; to be dimensioned WGT(0:2*NN —1) : (double) complex vector with phase factors for post processing real transforms; to be dimensioned RW(0 : NN 1) : same use as in CCZT; to be dimensioned CH(0 : 2 * NN 1), CWNK(0 :2 * NN 1) : see comments to CFFT routine : see comments to RFFT routine
C WGT RW
—
CH, CWNK ISIG INRE
—
—
Output C : transformed vector; input data are overwritten WGT, CWNK : see comments to CCZT routine IERR : error flag: IERR = 0: no detected errors IERR = —2: NN is odd, it must be even IERR = 1: Warning: NN can be factorized into allowed radices (D)R2CZT Usage
: DFT of two real input sequences of NN data via the CZT : CALL R2CZT(NN, C, B, WGT, CH, CWNK, ISIG, INRE, IERR)
Arguments Input NN C B WGT, CH CWNK, ISIG INRE
: : : : : :
input data length; NN is any positive integer (double) real vector with first input sequence; to be dimensioned C(0: 8* NN 1) (double) real vector with second input sequence; to be dimensioned B(0 : 4 * NN) see comments to CZZT routine see comments to CCZT routine see comments to RFFT routine
Output C B WGT CH, CWNK IERR
: : : : :
first transformed vector; input data are overwritten second transformed vector; input data are overwritten see comments to CZZT routine see comments to CCZT routine error flag: IERR = 0: no detected errors IERR = 1: Warning: NN can be factorized into allowed radices
—
3.2. Intermediate level modules (D)TABLE
: prepares tables of exponentials, radix factorization and index permutation tables for the DFT
(D)CFFTD : computes direct DFT’s, driving the decimation-in-frequency basic modules (D)CFFTI : computes inverse DFT’s, driving the decimation-in-time modules
G. Cabras et at
/
UNIDFT: optimized discrete Fourier transforms
(D)FFTOR
: reordering phase after a direct DFT or before an inverse one (D)REDIR : post-processing phase in direct real DFT (D)REINV : pre-processing phase in inverse real DFT (D)RMERG : merge two real NN-point DFT’s into one complex NN-point DFT, or the reverse (D)RRFT2 : post-processing phase in doublereal DFT (D)CZTAB : prepares further tables of exponentials in CZT
(D)FFT8F (D)FFT5F (D)FFT4F (D)FFT3F (D)FFT2F
: : : : :
radix-8 radix-S radix-4 radix-3 radix-2
121
FFT FFT FFT FFT FFT
3.3.3. Basic inverse transforms All the following modules implement decimation-in-time algorithms (D)FFT8T : radix-8 FFT (D)FFT5T : radix-S FFT (D)FFT4T : radix-4 FFT (D)FFT3T : radix-3 FFT (D)FFT2T : radix-2 FFT
3.3. Low-level modules 4. Examples 3.3.1. Table FACTOR (D)WGHT PERMUT
setup : factor decomposition of NN : computes twiddle factor table : computes the index vectors INDEX1, INDEX2 for reordering phase CHANGE : changes the order of elements in INDEX vectors PROLON : finds the most efficient FFT radix to be used in CZT (D)SWGHT : tables of square weights for CZT 3.3.2. Basic direct transforms All the following modules implement decimation-in-frequency algorithms ___________________________ ____________
In the following we give some examples and hints to the user. A few general remarks have to be raised: Transformed data need to be rescaled by a factor 1/NN in complex inverse DFT. In real transforms, direct DFT’s have to be scaled down by a factor 2; inverse DFT’s by a factor NN. The user must provide an overall data size NNMAX in the calling program, using, for example, the PARAMETER instruction. Obviously, NNMAX should be given a value at least equal to the maximum data length; if CZT modules are used, NNMAX should be at least 4 * NN. —
—
—
4.1. Example 1 DFT of NN C C C C
=
1024 real input data. Output in sequential form
DIMENSIONING REAL INPUT, COMPLEX OUTPUT DATA AND AUXILIARY VECTORS
PARAMETER(NNMAX
= 1024) REAL RIN(0 : NNMAX +1) COMPLEX COUT(0 : NNMAX/2) COMPLEX WGT(0 : NNMAX/2 1), RW(0 : NNMAX/4 —1) INTEGER INDEX1(0 : NNMAX 1), INDEX2(0 : NNMAX/2 —1) EQUIVALENCE (RIN(0), COUT(0)) -
—
C C
CALLING DIRECT DFT
122
G. Cabras ci a!.
/
UNIDFT: optimized discrete Fourier transforms
C CALL RFFT(1024, RIN, WGT, RW, —1, 1, 1, INDEX1, INDEX2, IERR) C C C C
OUTPUT IS IN COUT. TO BE RESCALED BYAFACTORO.5
4.2. Example 2 DFT of NN
=
777 complex data in double-precision
C PARAMETER(NNMAX = 777) DOUBLE COMPLEX C(0: 2 * NNMAX 1), CH(0: 2 * NNMAX 1) DOUBLE COMPLEX WGT(0 : NNMAX 1), CWNK(0 : NNMAX -1) -
-
C C C
CALLING CHIRP-ZETA TRANSFORM CALL DCCZT(777, C, WGT, CH, CWNK,
C C C C
-
I, IERR)
OUTPUT DATA ARE IN C, TO BE SCALED DOWN BY NN
4.3. Example 3 Filtering NN
=
2048 real data
C PARAMETER(NNMAX = 2048) REAL RIN(0 : NNMAX +1) COMPLEX COUT(0 : NNMAX/2) COMPLEX WGT(0: NNMAX/2 1), RW(0: NNMAX/4 -1) INTEGER INDEX1(0: NNMAX 1), INDEX2(0: NNMAX/2 EQUIVALENCE (RIN(0), COUT(0)) -
C C C C
1)
FILTER IMPULSE RESPONSE IS HT FILTER TRANSFER FUNCTION IS HF REAL HT(0 : NNMAX +1) COMPLEX HF(0 : NNMAX/2) EQUIVALENCE (HT(0), HF(0))
C C C C
CALL DIRECT TRANSFORMS; REORDERING IS NOT NEEDED CALL RFFT(2048, HT, WGT, RW, —1, 0, 1, INDEX1, INDEX2, IERR) CALL RFFT(2048, RIN, WGT, RW, —1, 0, 1, INDEXI, INDEX2, IERR)
C C
MULTIPLY IN THE FREQUENCY DOMAIN
G. Cabras et a!.
C C C
/
UNIDFT: optimized discrete Fourier transforms
COUT(I) = COUT(I) * HF(I), I = 0,.. THEN TRANSFORM BACK
. ,
123
1024
CALL RFFT(2048, COUT, WGT, 1, 0, 0, INDEX1, INDEX2, IERR) C C C
OUTPUT DATA ARE IN RIN, TO BE RESCALED BY A FACTOR 1/4./2048
4.4. Example 4 Simultaneous filtering of two real data sets of length NN
=
1215
C PARAMETER (NNMAX = 1215) REAL RIN1(0: 2 * NNMAX 1), RIN2(0 : NNMAX) COMPLEX COUT1(0 : NNMAX 1), COUT2(0 : NNMAX/2) COMPLEX WGT(0 : NNMAX -1) INTEGER INDEX1(0: 2*NNMAX 1), INDEX2(0 : NNMAX -1) EQUIVALENCE (RIN1(0), COUT1(0)) EQUIVALENCE (RIN2(0), COUT2(0)) -
—
-
C C C
FILTER IS GIVEN IN THE FREQUENCY DOMAIN COMPLEX HF(0 : NNMAX/2)
C C C
CALLING REAL DFT CALL R2FFT(1215, RIN1, RIN2, WGT, —1, 0, 1, INDEX1, INDEX2, IERR)
C C C C C
MULTIPLY IN THE FREQUENCY DOMAIN COUT1 AND COUT2 BY HF AS IN EXAMPLE 4.3 THEN TRANSFORM BACK CALL R2FFT(1215, COUT1, COUT2, WGT, 1, 0, 0, INDEX1, INDEX2, IERR)
C C C
RESCALE RIN1, RIN2 BY FACTOR 1/4/1215
5. Perfonnance evaluations As a test sequence, a monochromatic signal with integer angular frequency has been used f(n)
=
e
1
fl ,
n
=
0,... ,NN —1,
where us the imaginary unit and c~is an integer The DFT of
f( n) is a
delta function placed at
The real transforms have been tested using the real exponential sequence g(n)=b
,,
n=0...
NN—1
where b isDFT a realofconstant, set toh~own 0.3 and tests; the g(n) is also to 0.9 be in our (1— bNN)/(l
—
bW~N),
k = 0,.. .,NN —1.
Each CPU-time measurement is an average over 100 evaluations, to reduce possible errors due to
124
G. Cabras et a!.
/
UNIDFT: optimized discrete Fourier transforms
Table 1 Complex transforms
Table 2 Real transforms
NN
User interface
Basic modules
CPU-time (ms)
Error
NN
User interface
Basic modules
CPU-time (ms)
Error
32 33 64 80 81 193 255 321 383 385 511 512 513 1000 1023 1024 1025 2000 2047 2048 2048 2049 2187 2400 2500 2550 2560 2561 3000 3071 3072 3073 3125 3600 3645 3750 4000 4095 4096 4096 4096 4096
CFFT CCZT CFFT CFFT CFFT CCZT CCZT CCZT CCZT CCZT CCZT CFFT CCZT CFFT CCZT CFFT CCZT CFFT CCZT CFFT CCZT CCZT CFFT CFF’T CFFT CCZT CFFT CCZT CFFT CCZT CFFT CCZT CFFT CFFT CFFT CFFT CFFT CCZT CFFT * CCZT DCFFT DCFFT CFFT CFFT
48 2-5-8 8 2-5-8 3 8 8 2-8 2-8 2-8 2-8 8 38 5-8 4-8 2-8 58 258 8 4-8 8 258 3 3•4-5-8 4-5 2-5-8 5-8 2-8 3-58 28 23-8 2-8 5 2-3-5-8 3-5 2-3-5 4-5-8 2-8 8 2-8 8 8 3-5-8 5-8
4 39 5 12 16 203 199 407 412 430 422 62 751 251 927 147 1445 540 2016 331 2022 3193 616 665 850 3192 474 4553 897 4692 619 4594 1125 1081 1242 1269 1141 4732 601 4702 1191 1090 1263 1605
0.68704E—7 0.52896E—6 0.13811E—6 0.12667E—6 0.14136E—6 0.25212E—5 0.40163E—5 0.48473E—5 0.70951E—5 0.51740E—5 0.49959E—5 0.14155E—6 0.73485E—5 0.90029E—7 0.10971E—4 0.14200E—6 0.10830E—4 0.89658E~—7 0.38568E—4 0.14187E—6 0.45260E—4 0.34407E—4 0.22334E—6 0.10291E—6 0.10719E—6 0.39012E—4 0.12374E—6 0.52019E—6 022445E—6 0.39892E—4 0.17758E—6 0.57064E—4 0.84942E—7 0.23355E—6 0.24547E—6 0.15370E—6 0.89481E—7 0.66543E—4
64 64 500 500 512 512 625 729 900 900 1000 1000 1024 1024 1215 1250 1250 1458 1458 1500 1500 1536 1536 1800 1800 1875 2000 2000 2046 2047 2048 2048 3072 3072 3125 3600 3600 3645 3750 3750 4000 4000 4096 4096 4096 4096
R2FFT RFFT R2FFT RFFT R2FFT RFFT R2FFT R2FFT R2FFT RFFT R2FFT RFFT R2FFT RFFT R2FFT R2FFT RFFT R2FFT RFFT R2FFT RFFT R2FFT RFFT R2FFT RFFT R2FFT R2FFT RFFT RCZT R2CZT R2FFT RFFT R2FFT RFFT R2FFT R2FFT RFFT R2FFT R2FFT RFFT R2FFT RFFT R2FFT RFFT R2CZT RCZT R2FFT RFFT R2FFT RFFT
8 4-8 45 2-5 8 4-8 5 3 3-4-5 2-3-5 5-8 4-5 2-8 8 3-5 2-5 5 2-3 3 3-4-5 2-3-5 3-8 3-4-8 3-5-8 3-4-5 3-5 2-5-8 5-8 4-8 8 4-8 2-8 2-3-8 3-8 5 2-3-5-8 3-5-8 3-5 23-5 3-5 4-5-8 2-58 8 4-8 2-8 8 3-5-8 3-4-5-8 5-8 4-5
9 6 134 64 76 36 188 194 250 120 269 134 169 73 365 402 189 415 188 465 229 294 136 513 260 652 576 277 966 2041 369 167 637 288 1132 1088 512 1240 1366 652 1224 565 760 362 4702 2046 1396 669 1702 847
0.23645E~-7 0.43947E—7 0.17540E—7 0.32685E—7 0.36775E—7 0.65045E—7 0.19020E—7 0.11390E—7 0.20519E—7 0.38297E—7 0.23574E—7 0.41878E—7 0.13782E—7 0.24100E—7 0.60173E—8 0A0394E—7 0.18408E—7 0.24246E~-7 0.43353E—7 0.19695E~-7 0.35494E—7 0.13262E—7 0.23350E—7 0.16716E—7 0.30124E—7 0.15276E—7 0.16001E—7 0.29074E—7 0.92022E—6 0.18927E—5 0.69005E—8 0.13078E—7 0.65519E—8 0.12047E—7 0.64606E—8 0.19093E—7 0.34445E—7 0.91796E—8 0.19447E—7 0.34577E—7 0.87083E—8 0A5612E—7 0.38679E—8 0.72405E—8 0.23441E—5 0.10733E—5 0.12665E—7 0.22943E—7 0.23397E —7 0.41798E—7
4800
5000 *
*
—
0.87655E—4 0.29220E—16 -~
0.10302E—6 0.10411E—6
Unreordered output.
machine-dependent effects. In each case we also evaluate an error in cornputing the DFT, as the maximum norm of difference between exact and calculated data. Our
4800 4800 5000
5000
G. Cabras et at
/
UNIDFT: optimized discrete Fourier transforms
we observe that our module runs faster than its counterpart in ref. [1], by some 8% in single preci-
Table 3 Comparative table NN
s.p. 64 d.p. 64 ~.p. 512 d.p. 512 s_p. 4096 d.p. 4096
125
(D)FFT8F UNIDFT
(D)R8TX ref. [11
time in ms
error
time in ms
error
6.8 10.8 54.5 98.9 593.0 1064.7
0.12014E—6 0.29643E—16 0.11963E—6 0.29143E—16 0.11986E—6 0.29282E—16
7.4 12.0 63.0 115.7 631.1 1244.5
0.13811E—6 0.29121E—16 0.13817E—6 O.29304E—16 0.13970E—6 0.29376E—16
sion, and 20% in double precision calculations. This fact is largely due to the computation of exponentials, which is done via a pre-computed table in our case, whereas is done in the numerical module itself in ref. [1]. Our choice is motivated to only by higher efficiency, but also by the need to optimize convolutions, and in general any repeated usage of equal length DFT’s.
6. Conclusions and perspectives results have been reported in tables 1, 2 for complex and real transforms respectively. Looking at the tables we observe that: The top performances are achieved when NN is an integer power of eight, which is expected, since radix-8 transforms require the minimum number of complex multiplications and additions among our modules [6]. Radix-3 and -5 modules are comparatively slower. We can say that the higher log 2(NN), —
—
the higher the efficiency of the corresponding DFT. The ordering phase causes an overhead of about 10% in radix-8 DFT on 4096 points. The CZT modules require some 5—6 times as much CPU-time as their FFT counterparts. The NN real transforms require the CPU-time of the NN/2 complex ones, plus an overhead of 10—15% due to post-processing. Double-precision transforms need roughly twice as much time as the single precision ones. We also compared the performances of UNIDFT to those of an existing package [1] with similar properties of flexibility and portability. In particular, the most efficient modules in both packages, the afore-mentioned radix-8 FFT, have been directly compared. Our results are reported in table 3, from which —
—
—
—
We believe that at present UNIDFT meets the requirements of computational efficiency, flexibility and portability, which are of interest in all applications of the discrete Fourier transforms on general-purpose computers. Further developments can be suggested, like designing specialized versions of UNIDFT for personal computers or programmable micro. processors, as requested in real-time applications.
References [1] Programs for Digital Signal Processing, ed. DSP Committee of IEEE-ASSP Society, IEEE Press (1979). [2] E.O. Brigham, The Fast Fourier Transform (Prentice Hall, Englewood Cliffs, NJ, 1974). [3] A.V. Oppenheini and R. Schafer, Digital Signal Processing (Prentice Hall, Englewood Cliffs, NJ, 1975). [4] J.W. Cooley and J.W. Tukey, Math. Comput. 19 (1965) 297. [5] W.M. Gentleman and 0. Sande, The Fast Fourier Transform for Fun and Profit, AFIPS Conf. Proc., Fall Joint Comp. Conf. 29 (1966) 563. [6] GD. Bergland, Math. Comput. 22 (1968) 275. [7] R.C. Singleton, IEEE Trans. Audio Electroacoustics 2 (1969) 93. [8] S. Winograd, Math. Comput. 32 (1978) 175. [9] L.R. Morris, IEEE Trans. Acoustics, Speech and Signal Processing ASSP 26 (1978) 141.
126
G Cabras ci a!. / UNIDFT.- opiimi:ed discrete Fourier iransform.s
TEST RUN OUTPUT No 12 COMPLEX FFT COMPLEX INPUT DATA C 0) 0.100000001+01 C 2) 0.109017001.00 C 4) 0.309016971.00 C 6) —0.30901703(.00 C 8) —0.109017061.00 (10) —0.I0000000E,0t (12) —0.109016968.00 (14) —0.309016641.00 (16) 0.309017121.00 (II) 0.109017248.00
0.00000000E+00 O.51771S248.O0 0.951056548.00 0.951056418,00 0.5S77151SE*0O —0.I?422777E—07 —0.517705368.00 —0.951056601,00 —O.9S105640E.0O —O.517714951.0O
C C C C C
1) 0.951056548*00 3) 0.5l7715241.0O 5) —0.431113011—07 7) —0.5$7715421.0O 9) —0.951056601.00 Cli) —0.951056421.00 (13) —0.517785078.00 (15) 0.119241118—07 (17) 0.581715401.0O (19) 0.951056541+00
O.309017001.00 O.S09017101*OO 0.10000000E.01 0.109016111.00 0.309016791.00 —0.309017218.00 —0.809017128.00 —0.100000001.01 —0.809016121.00 —0.30901694E.OO
CFFT C 0) C 2) C 4) C 6) C I) (10) (12) (14) (16) (19)
DIRECT TRANSFORM 0.134465031—06 —0.357627S78—06 —0.111422621—05 —0.109909471—06 —0.27O1295oE—06 —0.791617151—07 0.119209291—06 —0.566244131—06 —0.142717251—06 0.199657171—07 0.357627171—06 0.238418581—06 —0.449393991—06 0.320321221—06 —0.346939291—07 0.421241571—06 —0.10721S36E—O5 —0.566244131—06 0.441711361—06 0.727755141—07
C 1) 0.200000028.02 0.241398l11—O5 C 3) —0.333716661—06 —0.10810911(—06 C 5) —O.29802322E—06 —O.536441S01—06 C 1) 0.357197611—06 0.129376591—06 C 9) 0.491179541—09 —0.246199201—06 (11) 0.000000001.00 0.261220901—06 (13) 0.347925351—07 —0.412259941—06 (15) —0.655651091—06 0.171113931—06 Ci?) 0.490466731—06 —0.712671911—06 (19) 0.711169381—06 —0.958745761—07
CPFT C 0) C 2) C 4) 6) 8) (10) (12) (14) (16) (18)
INVERSE TRANSFORM 0.100000011.01 —0.178813991—07 0.809017101.00 0.517795361+00 0.309017031.00 0.951056661.00 —0.30901706E.00 0.951056901*00 —O.IO9O17ISE,00 0.587785241+00 —0.100000021*01 —0.700030041—07 —0.109011008i0O —0.53771542E.OO —O.30901679E.00 —0.951056661.00 O.30901715E.O0 —0.951056661.00 0.809017361.00 —0.587785121.00
C 1) C 3) C 5) C 7) C 9) (11) (13) (15) (11) (19)
0.951056701+00 0.51771530E,00 —0.621352431—07 —0.517715481.00 —0.951056901.00 —0.951056601100 —0.51778512E.0O —0.248345421—07 0.537715561.00 0.951056661.00
0.309017061.00 0.109017001.00 0.100000021.01 0.809017008+00 0.309016851.00 —0.309017241.00 —O.10901?30Ee00 —0.100000021.01 —Q.809016821.OO —0.309017001.00
Ne 22 COMPLEX CiT COMPLEX INPUT DATA 0) 0.100000001+01 C 2) 0.789140528+00 4) 0.265685508*00 C 6) —0.401695401.00 C 8) —0.179473751*00 (10) —0.91638132E.00 (12) —0.677281621+00 (14) —0.825193898—01 (16) 0.546941131*00 (18) 0.945817238+00
C 1) C 3) C 5) C 7) C 9) (11) (13) (15) (17)
0.945817231.00 0.546948191.00 —0.825793301—01 —0.677281561*00 —0.986361321.00 —0.879473151+00 —0.401695251+00 0.245485661+00 0.789140648+00
DIRECT TRANSFORM —0.762939468—06 —0.190734871—06 —0.129312471—06 0.765376001—06 —0.157217001—05 —0.150181641—05 0.107296191—05 0.142707091—05 0.936684561—06 —0.890065641—06 —0.18509118E—05 —0.150951148—05 0.148486831—05 0.149315821—OS —0.487014971—06 —0.117255968—05 —0.15654578E—05 —027613110E—05 0.470276261—06 —0.368533538—06
C 1) C 3) ( 5) C 7) C 9) (11) (13) (15)
0.190000081.02 0.~536T432E—06 —0.315142191—OS 0.143743918—OS 0.276713171—07 0.131999461—06 —0.140743631—05 —0.750339591—06 —0.269609181—05 0.849621808—06 0.23995403E—05 —0.238404071—05 —0.115646851—05 0.262859170—05 —0.398148131—06 0.435154361—06 0.12243282E—0S —0.300731071—OS
CCZT INVERSE TRANSFORM 0) 0.100000021.01 —0.296140971—06 2) 0.119140108.00 0.614212071+00 4) 0.245485341*00 0.969401008*00 6) —0.401695581*00 0.915714291+00 C I) —0.879474941+00 0.475947651+00 (10) —0.916361681*00 —0.164594808+00 (12) —0.677282211.00 —0.735724818.00 (16) —0.825101421—01 —0.996515438*00 (16) 0.546948498+00 —0.837166978.00 (18) 0.945817838*00 —O.32410003E.O0
C 1) C 3) C 5) C 7) C 9) Cli) (13) (15) (17) C
CCZT C 0) 2) 4) C 6) C 8) (10) (12) (14) (16) (18)
0.000000101+00 0.614212691.00 0.969400291.00 0.915713331+00 0.475947411.00 —0.164594561+00 —0.735723851+00 —O.99650448E’OO —0.837166491+00 —0.324699521.00
(11)
0.965S1765E+00 0.546948611+00 —0.825794641—01 —0.677212271.00 —0.916362161+00 —0.879474468*00 —0.401695198*00 0.245485938*00 0.789142198.00
0.324699461*00 0.037166498.00 0.99658448(+00 0.735723918.00 0.164594621.00 —0.475947358*00 —0.915773338400 —0.969400231*00 —0.614212511+00
0.324699341.00 0.837167321.00 0.996515251.00 0.735724278*00 0.164594518.00 —0.475947741.00 —0.915774291.00 —0.969400828.00 —0.614212691+00
G. Cabras et at
/
UNIDFT.- optimized discrete Fourier transforms
No 31 REAL PFT REAL INPUT DATA 0) 0.100000001.01 C 2) 0.809999948,00 C 6) 0.656099928+00 C 6) 0.531440918+00 C 8) 0.430467101+00 (10) 0.348678328.00 (12) 0.282429438.00 (14) 0.228767131.00 (16) 0.185301931.00 (18) 0.150094558.00
C 1) C 3) C 5) C 7) C 9) Cli) (13) (15) Cl?) (19)
0.899999988+00 0.728999911.00 0.590489921.00 0.478296791.00 0.387420398.00 0.313110470.00 0.254186488+00 0.20519103E.00 0.166771748.00 0.135085098+00
RFFT DIRECT TRANSFORM 0) 0.171423218*0l 0.000000008.00 C 2) 0.6751004S0.00 —0.131354488,01 4) 0.505771600.00 —0.599701231.00 C 6) 0.474479028.00 —0.311757198.00 8) 0.464761148.00 —0.142211648.00 C10) 0.462328438.00 0.000000001*00 RPPT INVERSE TRANSFORM C 0) 0.100000018*01 C 1) C 2) 0.110000061.00 C 3) C 4) 0.656100038.00 C 5) C 4) 0.531441098+00 C 7) C 8) 0.430447198.00 C 9) (10) 0.34$671418,00 Cli) (12) 0.282429558*00 (13) (14) 0.228767920.00 (15) (16) 0.185302008.00 Cl?) (18) 0.150094548+00 (19)
C C C C C
1) 3) 5) 7) 9)
0.1289S9200,0l 0.550184798,00 0.415316750.00 0.468308758.00 0.462906211.00
—0.26903~0SE.0l —0.150531798.00 —0.436785108.00 —0.223009418.00 —0.693670518—01
0.900000101+00 0.729000218+00 0.S904S9918.00 0.471296858.00 0.387420548+00 0.313110448.00 0.254186518.00 0.205191131,00 0.166771101.00 0.i3S0lS12E.00 No 6* REAL CZT
REAL INPUT DATA C 0) 0.100000001+01 2) 0.809999948+00 C 6) 0.656099928.00 C 6) 0.531440918+00 C 8) 0.430467108.00 (10) 0.348678328+00 C12) 0.282429438.00 (14) 0.228767838.00 (16) 0.185301938+00 (18) 0.150094558.00 (20) 0.121576S12.00
C 1) C 3) C 5) C 7) C 9) (11) (13) (15) Cl?) (19) (21)
0.899999988+00 0.728999918+00 0.590489928.00 0.471296798+00 0.307420398+00 0.313S10471+00 0.2S4116481*O0 0.205191031*00 0.166771748.00 0.135085098+00 0.109411128.00
RPFT DIRECT TRANSFORM 0) 0.901522928.01 0.000000008.00 C 2) 0.74035311E.O0 —0.148324398.01 C 4) 0.531317698.00 —0.694795251.00 C 6) 0.492212300*00 —0.38S617128*00 C 0) 0.479417320+00 —0.205167411+00 (10) 0.474976111*00 —0.644258598—01 IFFY INVERSE TRANSFORM C 0) 0.100000078+01 C 1) C 2) 0.110000361.00 C 3) C 4) 0.656099981+00 C 5) C 6) 0.531441091.00 C 7) C 8) 0.630466988.00 C 9) (10) 0.348611328.00 Cli) (12) 0.212429618*00 (13) (14) 0.221767538.00 (15) (16) 0.115301861*00 (17) Cli) 0.150096301.00 (19) (20) 0.1Z1S71Z11.00 C2i)
C 1) C 3) C 5) C 7) ( 9) Cli)
0.900000108.00 0.?2900009E.00 0.590490161*00 0.478297008.00 0.387620658.00 0.313810598+00 0.254186398.00 0.Z05191510+00 0.166772118+00 0.13508S938.00 0.109419670.00
0.148371398.01 —0.275699198’Ol 0.584436398*00 —0.9?13~33SE.00 0.505379108*00 —0.51685170E.00 0.4I424~64E+00—0.218S5316E400 0.476526050.00 —0.131954760400 0.474485171+00 0.000000008400
127