Aerospace
Science
and Technology,
1997,
no 7, 489-501
Role of Parallelism
in Wavelet Analysis of Radar Imaging M. Roisin, A. Cosnuau ONERA, Manuscript
Roisin M., Cosnuau A., Aerospace Abstract
11, 1994; accepted October
3, 1996.
1997, no 7, 489-501.
Radar imaging
- Wavelet analysis - Parallel computing.
R6le du parallClisme dans l’exploitation de l’imagerie radar par ondelettes. La mCthode d’analyse en ondelettes continues, dCveloppCe par J. Bertrand, P. Bertrand, J. P, Ovarlez [14] permet de reprisenter la rCflectivitC d’une cible au moyen d’une fonction sur un espace B quatre parambtres, que nous convenons d’appeler hyperimage radar. Elle pr&ente une formulation thCorique de l’imagerie radar qui constitue une amklioration et une extension de la technique tomographique mise & ceuvre par C. Pouit [2] dans les annCes 1970. L’hyperimage devient un nouvel outil de perception des hyperfrCquences, devenant une interface entre le radar et l’ceil foumissant ainsi une reprCsentation pertinente de l’information contenue dans l’ensemble des mesures radar, encore appelk hologramme. La construction de l’hyperimage est aisCe gdce & un algorithme efficace et original qui utilise des outils de calcul rapides & base de transform&es de Fourier rapides (FFT). La charge des calculs est nCanmoins importante et l’exploitation de la mkthode requiert un temps d’execution rkdhibitoire pour une utilisation efficace sur machine sCquentielle. Cependant, 1’Ctude de la structure de l’algorithme conduit naturellement 2 envisager son implCmentation sur machine multi-processeur. Ce travail montre que sa programmation sur calculateur parallble permet d’approcher le temps rCe1 et par consequent de rendre le logiciel interactif. En effet, les performances du programme que nous avons dCveloppC sur l’iPSC860 de I’ONERA et que nous appellons, HYPIM, prouvent que dCsormais l’utilisation conjointe des nombreuses ressources de l’analyse en ondelettes et du calcul parallkle amCliorent les capacitCs de l’imagerie radar. Mots-cl&
Aerospace
Science and Technology,
August
Cedex, France.
The continuous wavelet analysis method developed by J. Bertrand, P. Bertrand and J. P. Ovarlez [14] represents target reflectivity by a function on a space with four dimensions, commonly called radar hyperimage. It represents a theoretical formulation of radar imagin g that is an improvement and an extension of the tomography technique developed by C. Pouit [2] in the 1970s. The hyperimage is becoming a new tool for perceiving microwaves, an interface between radar and the eye, providing the representation of the information contained in all the radar measurements, also called hologram. The hyperimage is easily constructed by an efficient, original algorithm that uses fast computation tools based on fast Fourier transforms (FFT). The computation load is however large and the execution time required by the method on a sequential computer is prohibitive. However, the structure of the algorithm naturally gives the idea of implementing it on a multiprocessor machine. Our work shows that programming the algorithm on a parallel computer makes it possible to approach real time performance and make the software interactive. In effect, the performance of our HYPIM programme developed on ONERA’s iPSC860 has demonstrated that the combined use of the many facilities offered by wavelet analysis and by parallel computation improves the capabilities of radar imaging. Keywords:
R&urn6
received
BP 72, 92322 Chktillon
Science and Techmlogy,
: Imagerie
1270.9638,
9710710
radar - Analyse
Elsevier,
Paris
ondelettes - Calcul parallble.
490
I - INTRODUCTION Radar imaging in the laboratory is based on a twostage process designed to provide the electromagnetic characterisation of an object. The first stage consists of illuminating the object at various frequencies and from various angles of view. This experiment is for the purpose of capturing the amplitude and phase of both of the incident wave and the wave reflected by the target to be able to analyse the target backscattering coefficient. To obviate problems related to the vectorial nature of the electromagnetic field, the polarisation is set for both transmission and reception. The radar is assumed to be completely coherent. The second stage concerns analysis and processing of all the measurements, also called radar hologram. This allows construction of a representation of the bright spots of the target. The radar imaging process developed by C. Pouit [2] in the 1970s is based on a model of isotropic, white reflecting points which is satisfactory only for small analysis bands. That is why, in the framework of broadband exploration of the backscattering coefficient, J. Bertrand, P. Bertrand and J. P. Ovarlez developed a theory which gives a systematic formulation of the radar imaging problem and is based on a model of reflection by coloured, directional bright spots [ 10, 13, 141. The target reflectivity is then described by a function on a space of four parameters commonly called hyperimage, which is a representation of the frequency and angle variation of the electromagnetic response of the target backscattering points. The hyperimage makes broadband high-frequency electromagnetic investigation of a target feasible, accessible and usable by organising perception hierarchically. In practice, construction of the hyperimage is based on wavelet analysis of the hologram. Its computation is based on development of a fast algorithm using efficient calculation techniques but which are however demanding in execution time. The purpose of our work is to show that this algorithm is naturally adapted to new parallel computer architectures and gives excellent results on ONERA’s iPSC860. Section II recalls the principles of the radar imaging wavelet analysis method [lo, 12, 141 leading to formulation of the hyperimage calculation. Section III details the corresponding algorithm. Its detailed review shows how suitable it is for the new parallel architectures. The many possibilities offered by wavelet analysis lead to a reflection on the features that should be provided by a software implementation. The association of the two approaches shows the importance of parallelism in radar imaging by wavelets. Section IV is dedicated to parallelisation and its performance. It includes a description of ONERA’s iPSC860. It gives the strategy used to implement various versions of the algorithm in the HYPIM
M. Roisin, A. Cosnuau
programme. It details the performance achieved on a variable number of processors of the parallel computer. This last part also examines management of the hyperimage for efficient use of the method. II - PRINCIPE OF HYPERIMAGE CONSTRUCTION BY WAVELET ANALYSIS II.1 - Radar imaging : hyperimage and conventional image The received signal comprises a number of reflections from backscattering points distributed on the target. The purpose of radar imaging is to locate and identify the contribution of each point to the backscattered signal. The aim is therefore to find a process that transcodes the phase and amplitude information contained in the hologram to make it perceptible to our vision. This is done by constructing representations of the projections of the target 3D image points on a horizontal plane such that the effective points of the target are discriminated in the radial and transverse range. In the conventional method developed at ONERA by C. Pouit, the scattering process is modelled by a white, isotropic point and the reflective parts of the target are obtained by processing similar to that used by scattering computations for optical holography, since a double Fourier transform is performed on the backscattering coefficient. In effect, the spatial location of the points in the radial range (longitudinal axis) is supplied by a frequency transformation of the backscattering coefficient measured for each of the angles. The location of the points on the transverse range, i.e. on the axis perpendicular to the radar line of sight is obtained by a Fourier transformation on the angles of the backscattering coefficient obtained for a given frequency. The result is then an image of the spatial location of the reflective points for an average frequency and an average analysis angle. To make a more detailed analysis and obtain the location of the reflectors for each value of the frequency and angle from the values of the hologram, J. Bertrand, P. Bertrand and J. P. Ovarlez developed a method which allowed them to provide a description of the target reflectivity by a function with four parameters. Their radar imaging method uses similarity group representations and can be related to wavelet analysis. For instance, when the target is modelled as a set of coloured, directional reflectors, the response of this model to all the deformations of the similarity group is analysed to be able to calculate a wavelet transform of the target backscattering coefficient. This yields a set of maps, called hyperimage, obtained by a unique formulation. This hyperimage is unique as equivalent representation of the target backscattering coefficient. The radar hyperimage has an important role to play in sectors of activity where it is essential to be able Aerospace
Science and Technology
Role of Parallelism in Wavelet Analysis of Radar Imaging/ R81e du paralle’lisme dans l’exploitation de 1‘imagerie radar par ondelettes
to characterise the backscattering properties of objects illuminated by electromagnetic waves. Construction of the hyperimage is motivated by the need for a tool which accounts for the behaviour of the reflective points of a target according to the frequency and angle of illmnination for instance for design of stealthy aircraft [ 151. II.2 - Wavelet analysis and radar hyperimage Wavelet analysis as used in various areas of physics and mathematics [8] is based on the wavelet transform which breaks a signal down into a basis of signals deduced from one another by expansion-translation of an analysing wavelet 4. The choice of function differs according as it is desired to obtain a breakdown basis which is orthogonal or which has location or regularity properties, for instance. The type of breakdown depends on the application. Wavelet analysis can be used on signals that are continuous or discrete in time [6]. In our case, which is an analysis of continuous signals, the wavelet transform can allow an approach that is different from the one provided by conventional transformations, such a short-term Fourier transform or a Gabor transform. It produces the time variation of the frequency by a frequency analysis at a constant relative bandwidth, i.e. unlike the fixed size analysis window used in conventional transforms, the window size is varied according to the frequency - it is narrow at high frequencies and large at low frequencies [7]. II.3 - Expression
of the hyperimage
The expression of the hyperimage computation is based on analysis of the transforms of the backslcattering coefficient and reflectivity by a change of observer. Analysis of the similarity group repres,entations on the backscattering coefficient and on its representation to be constructed, which underlies the hyperimage computation, plays a constructive role in the development described below and allows the radar imaging hyperimage analysis method to be compared with the wavelet analysis method. In effect, this analysis shows that target reflectivity can be expressed by the wavelet transform of the backscattering coefficient. Based on a target modelled as a set of coloured, directional reflectors, we shall go through the different steps followed by J. Bertrand, P. Bertrand and J. P. Ovarlez to analyse the response of this model to rotations, expansions and translations. At the same time as an observer going from one measurement system to+ another by a rotation by angle 4, Rd, a translation b and a change of scale a sees his own coordinates and those of the wave vector i modified according to the following transform [ 10, 131. ~‘-,~‘=a-lR&+g k&‘U-lR~; 1997.
no 7
(1)
491
the measurement of the backscattering coefficient also accompanied by a transform such that H(i)
-+ H’ (,o xx ue-2i7ikb --H(uR,&.
is (2)
The correspondence to be established between hologram H and the reflectivity expression I should be invariant for a change of observer. The following covariance diagram must be verified so that the hyperimages can be deduced from one another by local transformations:
A one-to-one correspondence exists between a point of the image space and an element of the similarity group. A map showing the location of the target bright spots versus the frequency and angle of illumination can be drawn. At the outset of the hyperimage construction process, from a physical standpoint, we consider a virtual or so-called reference target which is assumed to be at the origin of the reference system in Z = 0’. & is its backscattering coefficient, obtained for a reflection in direction 0 = 0 for a frequency verifying k = af = 1. From a mathematical standpoint, $0 is a” basic wavelet, considered attached to the image point whose coordinates are defined by 2 = 0’ and c z (Ire, 6’) = (1, 0). We then systematically examine the modifications occurring on the target backscattering coefficient (or on the basic wavelet which is equivalent) when its position is changed by application of a transformation of the similarity group. For instanc5 according to scheme (1) and (2), by a change (a, bj $), the function (POis transformed into a wavelet 4,-, c which is attached to a new point with coordinates i = b’ and i z (k: 8) = (, defined by:
(j) and is
The family of functions generated in this way is used as wavelet basis for breakdown of the hologram. Using the scalar product: (HI, Hz) = /”
k H,*
(k)
H2
(k)
dk,
0
for a given position of the reflective point, the coefficient of wavelet C (Z, z) is obtained by calculating the scalar product of the wavelet centred
492
M. Roisin, A. Cosnuau
on each frequency of the hologram support and the hologram values:
increasing X. A pertinent analysis of the target requires searching for the optimum parameters of the wavelet to achieve the best match between the wavelet and the points of the target to be analysed. Making the method operational therefore amounts to making the wavelet parameter management software interactive.
III - HYPERIMAGE
x (x cos 0’ + y sin 19’)df’
J
The wavelet coefficient specifies the electromagnetic response of the targets by four parameters. In effect, not only does it specify the location of the reflective point but also its electromagnetic frequency and angular behaviour. The hyperimage is constructed as the square of the wavelet coefficient modulus:.
I(Z, C) = $7 (5, :)I”. The wavelet coefficient can be calculated precisely by the Mellin transform [ 11, 121. It can then be evaluated directly in polar coordinates without interpolation or oversampling. The use of this techique requires geometric frequency sampling of the hologram. A sampling theorem similar to Shannon’s can be used [ll, 121. II.4 - Choice of the parent wavelet
COMPUTATION
SCHEME
It is recalled that the measurements are obtained by geometric frequency sampling and regular angle sampling. To clarify, let us set: be, = f (z cos 0’ + y sin 0’)
and ii@’ (f’) = H (f’, /3’) PQf’. Since the wavelet used is the product of a Klauder wavelet for the frequency part and a Gaussian for the angular part, we shall denote the frequency part as & ( f) and the angular part as $a (0). By applying the Mellin transform [ 11, 121, defined as w H (f, 0) fzizp df
M (P, 0) =
s0 and using its uniqueness and scale invariance properties, we obtain a new expression for the wavelet coefficient:
The wavelet used for this analysis is the product of the Klauder wavelet and a Gaussian [lo, 141. It is expressed:
The frequency part of the function corresponds to a spectral window with a relatively constant width, determined by parameter X related to the spectral width by the equation:
of = -.
These parameters is controlled by par ame&@. The angular “read therefore condition the analysis. They define the radial and transverse resolutions. For instance, if we take X low, we consider that the point is white on the observation band. Conversely, a high X allows us to analyse the frequency behaviour of the reflective points, since the band on which it is allowed to be white is narrow. This duality is reflected in the fact that improving the frequency resolution decreases the radial resolution and that improving the transverse resolution degrades the angular resolution. X and 00 characterise the uncertainty relations. Proceeding by trial and error, we can therefore increase the frequency resolution by
We then set:
Tf(0M@oL (f’)] [PI M* [d’v (f’)] [/3] f-2i*p d/3.
By introducing the Fou$ty coefficients of Tf (Q’) such that Tf” (0’) = &
Tf (8’) ecinR’ dd’ and those
J’
of the Fourier transfo’m for the angular part of the wavelet, the wavelet coefficient can be expressed:
n Aerospace
Science
and Technology
Role of Parallelism in Wavelet Analysis of Radar Imaging/ R81e du paralle’lisme dans l’exploitation de 1‘imagerie radar par ondelettes
III.1 - Complexity of the computation due to the number of operations It can be seen that the Mellin transform of the wavelet can be carried out once, whereas that of the hologram must be carried out for each angle. During the first computation step, it is established that the number of Mellin transforms required is No + 1. The compression term included in (3) becomes an attractive term for the calculation, since the expression The is reduced to an inverse Fourier transform. frequency information for all the angles is obtained after [calculation of: (No + 1) x FMT (N,, points) + NB x FFT (Nf points) where FMT (Ns) is the discrete Mellin transform on Nf points and FFT (Nf ) is the discrete Fourier transform on Nf points. Knowing [ 10, 1 I] that a discrete Mellin transform is the same as a fast Fourier transform, calculation of the integral in ,8 requires: (2 No + 1) x FFT on (Nf points). As for the part relative to the angles, first a fast Fourier transform is made on each frequency for the iV@ angles,. The wavelet requires a single FFT on No points, i.e. Nf x FFT (No points) + 1 x FFT (No points). The final information is obtained by calculating the inverse Fourier transform of the product of Tf” (0) and the wavelet Fourier transform. The final angular and frequency information is therefore obtained by calculating (2 NQ + 1) x FFT (Nf points) + (2 Nf + 1) x FFT (No points). Knowing that the wavelet is independent of the point, the complexity for N, NY points amounts to: IVZ x N, x [2 No x FFT (Nf points) + 2 Nf x FFT (No points)] + 1 x FFT (Nf points)
+ 1 x FFT (No points).
In the language of radar specialists, N, is relative to the radial range and NY to the transverse range. In addition, since the number of operations required for a fast Fourier transform on N points is around 5 N log N, it is established that calculation of the hyperimage requires: 10 x N, x Nzl x [No x Nf x [log (Nf)
+ log (No)]]
+5Nflog(Nf)+5Nglog(Ne). For instance, to construct a hyperimage consisting of sections of 1500 points from a hologram consisting of 12X frequencies and 64 angles, the number of FFTs to be caculated amounts to: 192,001 FFT
on 128 points
+ 364,001 FFTs 1997. il3 7
on 64 points,
493
i.e. a minimum performed.
of 480 million
operations
to be
III.2 - Discretisation of the wavelet coefficient computation 111.2.1 - Objects manipulated We have the following each point. l
iVf is the number
hologram fiv l
parameters and variables for of frequency
points of the
on (fmin, fma,);
No is the number of angular points;
e 4 is the geometric sampling rate; o H [ ] is the vector with dimension geometric sample to H;
Nf
of the
6 &, [ ] is the vector with dimension Nf of the geometric samples of the frequency part of the wavelet; l M [ ] is the vector with dimension Nf of the arithmetic samples of the Mellin transform of the frequency part of the wavelet;
o n/r,, [ ] is the vector with dimension Nf of the arithmetic samples of the Mellin transform of the backscattering coefficient of the target observed under the mth angular sample; l 4a [ ] is the vector with dimension No of the arithmetic samples of the angular part of the wavelet;
o F [ ] is the vector with dimension No of the arithmetic samples of the Fourier transform of the angular part of the wavelet; is the vector with dimension Ne of the l Flf arithmetic samples of the Fourier transform of Tl i ; l C [ ] is the result vector with dimension Nf x No of the wavelet coefficients.
Remark: The number of memory words to be allocated can increase very rapidly with the size of the problem. For instance, for a double precision calculation, not counting the intermediate tables, the minimum number of bytes to be allocated equals: No x Nf x size x (4 + npts/proc
x 3)
+ No x 2 x size x (1.5 + 2) + Nf x 2 x size x (1.5 + 2), where size is the world length in bytes. The memory size of the processors may therefore rapidly constitute a limit in the case of large domains. A solution to this problem is given in the section of this paper on parallel implementation.
494
M. Roisin, A. Cosnuau
111.2.2 - Sequencing of the operations Let us break down the wavelet coefficient calculation into subtasks to have a better idea of the number of operations to be performed and arrive at a hyperimage construction algorithm. The first task is to calculate the integral in ,8 for each angle supplying the frequency information, then the integral in 6’ for each frequency to have the angular information. Frequency Part: The expression yielding Tf (Q’) first requires calculating the product of two Mellin transforms. The term f-2ir@ of the integral in /3 allows calculation of the integral to be reduced to that of an inverse Fourier transform by geometric frequency sampling. The result is a frequency characterisation of each point of the target for its different presentations to the radar, according to the geometric frequency samples. Calculation
-2ilrippp
x
4~ [mfl = Pf
4~
P’
(fain
if=Nf-1 q&
[ip]
e
Nf-l
,
i.e. a total of 2 No + 1 FFT
on Nf points. Angular Part: We first calculate the Fourier transform of the angular part of the wavelet then, for each frequency, the Fourier transform of TmB and finally, the inverse Fourier transform of the product of the two. l Calculation of the angular Fourier transform - of the wavelet
i,j=NB [p,]
=
C
[if]
e
Nf-l
0
5
pp
-
r$a [ie]
e%fY
of TmO for each frequency I,, 0 < 1, < Nf - 1.
After rearranging table Tmg, we calculate the Nf discrete Fourier transforms on variable 8. i, =NB
Flf [13,] =
c zf [ie] e*. &J=o
l Calculation of the wavelet coefficient evaluation of the inverse Fourier transform product of the above Fourier transforms: im=Nn
% be] = ” cj,=o
O
) 2i7rifpp
C
Ad*
is=0
- Wavelet calculations
=
iii=Nf-1
F
Geometric sampling of &, with rate 4 = f max q around frequency f = 1 then calculation (4fmin of the Mellin frequency transforms of $,, and H,@. Below, the subsubscripts identify the variable with which they are associated. For instance, rnf , j,, pf are the frequency subscripts, pp the subscripts of the Mellin variable, in and rns the angular subscripts and j, the subscript of the dual variable of the angles.
bp]
i.e.
Steps
l
M
the geometric frequency samples for each angle me,
5
Nf-1.
C by
on the
Fzf [jn] x F [&I
The only external procedure required for software implementation is an FFT algorithm.
if=0
III.3 - Software - Calculation on the hologram for each angle mg, 0 5 me 2 No - 1: H,,
[jf] = $f H [jf] e2SnB fmin QJf,
b,,
= z (x cos(me)
+y sin(me)),
if=Nf-1
Mm, [pp] =
c HmB [if] e”Y;‘Y if=0 0 5 pp 5 Nf - 1.
l
,
Calculation of Tf (0’)
The aim is to calculate the discrete Fourier transform on ,/Jof the product of the Mellin transforms to recover
functionalities
111.3.1 - Analysis of needs Implementation of the method on a computer must satisfy two requirements. First of all, the hyperimage must be constructed in quasi-real time for the user. Secondly, an interactive interface should allow the user to benefit from the different aspects of wavelet analysis. This interface allows management of the choice of resolutions. For instance, the power of wavelet analysis is that it resolves the ambiguity between the location of the bright spots and determination of their directivity and colour. The software should therefore allow the user to choose the parameters of the parent wavelet which determine the angular and frequency spread and therefore the angular and frequency resolutions. An analysis by successive refinements leading to a detailed target representation is then easy if the hyperimages can be calculated in a sufficiently short time, thereby making interactivity Aerospace
Science and Technology
Role of Parallelism RBle
a!u
in Wavelet Analysis of Radar Imaging/
pavalle’lisme dans 1‘exploitation
de 1‘imagerie radar par ondelettes
possible. The computation power must be sufficient to satisfy these two requirements. One solution can be to use parallel machines. If the problem can be parallelised, the requirements can be met by having several processors perform the calculation. 111.3.2 - Hyperimage
sections arz hypevimage
The many possibilities offered by wavelet analysis can be used partially. Reduced versions of the algorithm have been implemented, depending on the desired analysis scenarios or available memory size. Three versions are proposed. The first offers a target representation for only a given angle and frequency (hyperimage section). The second includes frequency variations for a given angle of observation (hypersection). The third corresponds to calculation of the complete hyperimage. The most reduced version of the algorithm avoids any memory problems and corresponds to the intuitive concept of image. For all three versions relative to construction of the hyperhmage or one of its subsets, complete or partial calculation of the wavelet coefficients must be carried out in each of the target points. Below, the matrix of N, x NY points representing the discretisation of the geometric target shape is called “image”. III.33
- Analysis of parallelism
The calculations of the coefficients attached to each point of the image are independent of one another. The algorighm is therefore naturally parallel. This parallelism of the algorithm means that it should function efficiently on a multiprocessor machine, thereby reducing the calculation times to allow the analysis to be refined by trial and error and arrive at an optimal analysis of the reflective spots of the target. Another advantage of parallel computers is that they can handle large problems, since the machine memory size increases with the number of processors. The solution consists of evenly distributing the points to be calculated among the processors. In addition, since the points are independent of one another, no data transfer are required to calculate the hyperimage so that the processors can be networked in any way. The only communication occurs when the final results are sent outside, for instance to a graphic display terminal. We will come back later to this important practical consideration. The machine we used for this application was the iPSC860. Any other machine with a distributed memory can be used.
IV - PARALLEL IV.1 - Presentation
IMPLEMENTATION of the hypercube
The ONERA iPSC860 is a massively parallel machine with distributed memory. This computer has 128 processors (Intel i860), each with a local memory (8 megabytes) and a computation power between 3 and 1997, no 7
495
5 Mflops (million floating point operations per second) on 64 bits in FORTRAN or C. This computation speed depends on the nature of the computations, the quality of programming and whether or not the assembler routines supplied by Intel are used. For instance, Intel’s ID complex FFT on 256 points is executed at 35 Mflops. The processors are interconnected by a hypercube type network [17]. The transfer rate between two adjacent processors is 2.2 megabytes per second. The 9-gigabyte Concurrent File System (CFS) on 14 disks accessible in parallel available on this machine allows rapid storage access. A PC is used as host machine to allocate subcubes of 1, 2, 4, 32, .. . or 128 processors, compile the programmes, load the processes onto the processors and enter and output parameters. This host is not normally used for calculation. After examination of an algorithm and the possibilities of parallelisation, the application is divided into pieces that can be executed in parallel. The corresponding calculation tasks (identical or not) are then distributed to the processors. Generally, the tasks have to communicate with one another during the calculations to exchange data. This is done using dedicated message input and output routines included in the FORTRAN or C programme on each processor. This type of programming is called programming by message-passing. In the case of the iPSC, the communications are said to be asynchronous, i.e. when a message is sent from one processor to another, the sender does not have to know the status of the receiving processor. The message is stored in a reserved area of the receiving processor memory. When this processor reaches the reception routine for this message (identified by a number) included in the programme by the programmer, the reception variable is loaded by this message. This variable can then be used by the receiving processor for performing the calculations. However, if the receiving processor reaches the reception routine before the message is sent, it waits in this location until it receives the message. This is how the processes are synchronised with one another.
IV.2 - Parallelisation The parallelisation chosen consists of evenly distributing the image points among all the processors. In effect, whatever the type of analysis selected, the image is divided into subdomains, and each subdomain is associated with adifferent processor. Where P is the number of processors used and N the number of points to be constructed, the distribution of these N points is such that N = P* i + R. The first R processors handle calculation of (?;+ 1) points and the remaining (P - R) processors process 1;points. Each processor executes either (1: + 1) Y, or i Y operations where Y = 2xN0+1FFTon,Nfpointsand2xNf+1FFT on NQ points. The memory size required per processor therefore depends on the number of points and the size
496
M. Roisin, A. Cosnuau
of the hologram. For instance, when each processor processes and stores its block of i or i + 1 points in parallel for output by packets, the memory size available to it must be equal to Ne*N+ze*(4+3*i) where size is the word length in bytes. However, since the local memory of each processor is limited to 8 megabytes on ONERA’s iPSC, a large hyperimage must be calculated point by point and the result of each point instead of a block of points must be output to a disk, graphic display, etc. By freeing memory space on each processor in this way, 8 megabytes are made available for the calculation of each point. The above formula can then be used to establish the maximum size of a hologram. Since the memory size required to calculate a point on 64 bits is evaluated at approximately No * Nf * 7 * 8 bytes, 0.5 megabytes per processor are used for a hologram consisting of 128 frequencies and 64 angles. We can deduce that the hologram size must not exceed 128*1000 i.e., for instance, 1000 frequencies and 128 angles. We can also calculate the maximum number of points that a processor can handle and therefore the maximum size of all the sections of the hyperimage according to the number of processors available. Considering that each point requires a half megabyte of memory and that 7 megabytes are available for use on each processor, each processor can handle a maximum of 14 points. For 128 processors, the maximum number of points that can be calculated is 128 x 14, i.e. 1992. The wavelet coefficients are either stored in parallel in CSF files connected to the processors or sent by PVM (Parallel Virtual Machine) to a graphic display terminal. The points can be displayed by order of arrival. Display of the results according to the different dimensions of the hyperimage requires storing this image on the graphic display terminal. As will be seen below, the volume of result data to be managed raises interesting problems. IV.3 - Evaluation
criteria
Three criteria are generally used to evaluate the
advantage of parallelisation: the power in Mflops, the acceleration and the execution time. These three criteria are essential for evaluating the results of parallelisation, regardless of the machine used. It is recalled that the gain or acceleration represents what is gained by increasing the number of processors. When the complete application fits in the memory of a single processor, the gain can be calculated directly (other formulas exist when this condition is not satisfied). It is equal to the calculation time on a single processor divided by the calculation time on P processors. The theoretical accessible gain is therefore P if the entire application can be parallelised. IV.4 - Programming To calculate the wavelet coefficient, the complete hologram is loaded on eachh node by a programme installed on the host. The language used is C and the numbers are encoded on 64 bits. All the fast Fourier transforms required for the calculation are performed using a procedure of the BLAS (Basic Linear Algebraic Subroutines) library. The FFT is written in Assembler and can reach 35 Mflops on 64 bits when the number of points is below 512. The performance drops to 15 Mflops when the number of points is greater than or equal to 1024. This is due to the small size of the cache memory (8 Kbytes) of each processor. IV.5 - Results and analysis of the different experiments IV.5.1 - Hyperimage section: image for a one frequency and one angle We measured the execution times and accelerations (gains) for three hyperimage section dimenions (500, 1000 and 1500 points) and for from lto 128 processors. The accelerations are excellent, reflecting the parallel nature of the algorithm, the good choice of parallelisation strategy and the suitability of the machine for the problem.
500 points Number Time
1 image of processors
in seconds
Gains
1
2
4
8
16
32
64
128
74.21
37.64
18.55
9.48
4.76
2.39
1.21
0.63
1
1.97
4
7.83
15.6
31
61.33
117.79
1 image
1000 points Number Time Gains
of processors in seconds
1
2
4
8
16
32
64
128
149.48
75.38
37.4
18.81
9.43
4.83
2.41
1.22
1
1.98
3.99
7.94
15.85
30.94
62
122.52
Aerospace
Science
and Technology
Role of Parallelism in Wavelet Analysis of Radar Imaging/ R81e du paralle’lisme dans 1‘exploitation de l’imagerie radar par ondelettes
1500 points Number I
/ Time
1 image
of processors in seconds
1 Gains
1 1
1 223.69
I
1
1
2
1
4
1
8
1
16
1
32
1
64
1
128
1
(
112.10
/
55.77
1
27.93
1
13.96
/
7.09
/
3.61
1
1.82
j
I
1.99
I
4
I
8
/
16.02
j
31.55
/
61.96
/
122.9
I
In this version, the hyperimage is calculated by blocks of points. Each processor sends its results to the host, which then sends the hyperimage to a graphic display terminal. This version supplies a single image, similar to that of conventional radar imaging. It can be seen that the gains are very good, close to the maximum theoretical gains. The slight degrading observed is mainly due to final transmission of the results of all the processors to the host. It should be stressed that the number of processors can be increased even more for this application without appreciably degrading the gains. In effect, multiplying the number of processors by 3, i.e. 384 processors working on the domain of 1500 points should give a gain of approximately 35 1 by comparison with a single processor, since the gain is 117.79 for 128 processors processing 500 points, so it should be approximately 3 x 117.79 for 3 x 128 processors processing 1500 points.. From a real-time standpoint, it is interesting to note that for 1500 points and 128 processors, the results for 128 points are obtained simultaneously in 0.15 seconds!
IV.5.2 - Hypevsection : images for all frequencies and one angle The same types of measurements as above were made for the same hologram and section dimensions. The gains are not good in this case, since they deviate from the optimal gain as is shown by the curves. The number of frequencies Nf here is equal to 100. These results correspond to the partial version of the algorithm which consists of constructing a hypersection used to follow the frequency variation (128 frequencies) of the bright spots on the target for a given angle of observation. The results calculated on the processors handling calculation of the hyperimage by blocks of points are transferred to and collected by a single processor for transmission to a graphic display terminal. When the number of processors used exceeds 16, this approach is penalising. The larger the number of processors, the longer the communication times dues to conflicts on the network and access to the single host.
500 points Number Time
Nf of processors
1
in seconds
Transfer
87.56
time
0 I
Gains
1000 points
I
I
Number
1
of processors
/
time
Gains
8
16
32
64
43.34
21.69
10.96
6.01
3.64
2.55
0.23
0.28 I
2
Time
of processors in seconds
Transfert Gains
time
0.65 I
4
7.98
0.94 I
14.56
1.19 I
24.05
34.33
I
images
2
1
4
1
8
1
16
1
32
/
64
1
176.47
1
87.61
/
43.56
1
22.01
1
11.35
1
6.66
/
4.58
1
1
0
/
0.38
/
0.45
I
0.60
1
0.8
I
1.29
/
1.88
1
1
1
1
2
1
4
1
8
1
15.56
1
26.49
j
38.53
1
1500 points Number
0.42 I
1
in seconds
Transfer
4
Nf 1
images
2
I
1
I
I Time
497
Nf 1
images
2
4
8
16
32
64
267.19
132.09
65.92
33.19
17.04
9.3
6.08
0
0.5 1
0.62
0.79
0.99
1.41
2.05
I
2
4
8
15.68
28.73
43.94
M. Robin,
49s
Gain + ----X--- l 42 - -
Time Section Section - Section Optimal
versus
number
of processors
of 500 points of 1000 points of 1500 points gain
1000 points Number 2
10
18
A. Cosnuau
26 Numbers
34 42 of processors
50
62
Time
of processors in seconds
time is subtracted from the execution time, the curve again becomes linear and close to the theoretical gain. When the size of the image to be calculated increases, the gains increase. This is due to the fact that the programme execution time is an affine function of the number of points. The value of the constant, which is 0.5 seconds, is exactly equal to the transfer initialisation time. The performance could therefore be improved by modifying the result transfer management and speed. Furthermore, although around 2 seconds are currently required on the iPSC to transfer a hyperimage section with 128 frequencies, this time falls to 2/15 seconds and to 212.5 seconds in the near future on a PARAGON type machine. IV.5.3 - Hyperimage: images for all the frequencies and angles
1500 points Number Time Gains
Nf
x No images
I
32
64
128
16.16
8.07
4.03
I 1 I 2 I 4
I Gains
Contrary to the above case, when only the final result for a single frequency was transferred to the host, this time the results are sent for 100 frequencies. For instance, for 64 processors and a lOOO-point image, the final transfer time is 1.88 seconds instead of 0.019. For Nf frequencies, the transfer time is roughly equal to the product of the number of frequencies times the transfer time measured for a single frequency (& x Ttra+ ). It can be noted that when the transfer
I
of processors in seconds
I I
Nf 64
I
x No images
I
12.11
128 6.05
1
2
It can be seen that the results are very good and the gains are optimal. The processing time varies linearly according to the number of processors or the hyperimage section size. When the number of processors is doubled, the execution time is cut in half. If real-time display is desired, with calculation and output point by point, it is observed that for the case with 1500 points, the results can be calculated simultaneously for 128 points (i.e. 128 planes 128 x 64) in 0.5 seconds. Real-time display is possible by sending the 128 points calculated on the hypercube simultaneously to a graphic display terminal (SGI type, for instance) by the PVM software every 0.5 seconds. The image are then stored with the possibility of being able to display the hyperimage sections when all the points are present in the SGI.
Complete algorithm The complete algorithm consists of constructing all the images corresponding to the different angles and frequencies of observation, i.e. constructing the hyperimage. In this version, a CFS file is associated with each processor, which sends its results packet to this file. File write is therefore accelerated by the use of the iPSC file system, accessible in parallel. The tables below concern this calculation alone without transfer to the host of the results. Nf equals 128 and No equals 64. Nf denotes the number of frequencies and No the number of angles, the closest power of 2 of the number of observations. Nf is evaluated at 128 and No at 64 for our experiments.
IV.5.4 - Comparison between the sequential and parallel versions The same algorithm was executed on sequential and parallel machines, The tables below show the results. where Nf is equal to 128 and NQ to 64. Far better times were achieved on the iPSC than on any of the sequential machines (Sun4, SGI, RISC6000). The method presented is therefore operational and very competitive. Aerospace
Science and Technology
I
Role of Parallelism R81e du paralMisme
in Wavelet Analysis of Radar Imaging/ dans 1‘exploitation de l’imagerie radar par ondelettes
499
from strictly hardware solutions, such as efficient file management for storage and future retrieval of the data corresponding to the different dimensions of the hyperimage, “parallel computation” solutions can be considered. The PFS file system cannot currently be connected physically to a graphic display terminal. It is therefore necessary to send out the data via LAN. For instance, assuming a transfer rate of 2 Mbytes per second as is the case in the iPSC860, more than 11 hours would be required to perform this operation for the 80 Gbytes mentioned above. We suggest several empirical ways of solving these problems. IV.55
- Remarks on computation power
Calculation of all the FFTs represents 81 percent of the operations of the application but only 9 percent of the total time. The remaining operations, representing 91 percent of the time, mainly consist of trigonometric functions. These percentages are given for the current version of the code which contains: N, x N, x No x (8 x Nf + 2) trigonometric functions, where N, x Nzl are the number of points of the hyperimage sections, No is the number of angles of the hologram and Nf the number of frequencies. This is due to the imbalance between the power of the FTT, embedded in the code, which is 23 Mflops, compared with the other types of functions whose speeds are 0.34 million logarithms or exponentials per second 0.27 million sine/second, 0.48 million sqrtskecond and 14.1 million divisions/seconds. Under these conditions, the performance of the HYPIM code on 32 processors for a hologram size of 128*64 is 80 megaflops. The performance is multiplied by four for 128 processors, i.e. 320 megaflops. If all the codes were executed at the speed of the FFTs, the performance would be 3 Gflops (billion floating point operations per second) on 128 processors. The obvious conclusion is that improving the power requires optimising the trigonometric functions.
IV.6 - Management
of the results
Once the calculation is completed, it is desired to view the results, i.e. the hyperimage sections. Problems of time then arise: l Transfer time from the processors to the host and from the host to the graphic display terminal. l The time required to process the data received by the graphic display terminal. l The memory space and data accesses required. In effect, in the present case: 1500 points, 128 frequencies, 64 angles, this represents approximately 98 Mbytes to be transferred from the parallel machine to the graphic display terminal and to be stored and managed on the graphic display terminal. In the case of a hyperimage with 100’ (z, y) points and 1O242 frequency and angle points, there are more than 80 gigabytes to be transferred and stored. Aside 1997, Ilo 7
IV.6.1
- Possible solutions
The basic desire of the user is to be able to run through the four-dimensional space of the hyperimage (2, y, frequency, angle) graphically. The standard approach adopted is therefore as follows: the parallel machine calculates the complete hyperimage which must then be transferred to the graphic display terminal, stored and processed to display the results. Based on a number of essentially practical considerations, a few factors could increase the hyperimage computation speed, the transfer rate, the graphic display terminal processing speed and especially facilitate storage. We suggest the following solutions; l Have the computation processors encode of the results on 8 bits instead of 64 to gain in transfer time and storage space. a Use image compression routines. o Have the computation processors send only frequency and angle intervals (or isolated values). Several types of viewing are possible. For instance, the user can view the images one by one. In this case, if he analyses each (z, 1~) image in around 30 seconds and views the images by groups of 10 frequencies and 10 angles, i.e. 100 images, he will need 3000 seconds, i.e. nearly an hour, to view this interval! Another viewing mode consists of running rapidly through several images (5, 10 or 12 images a second) and stopping on a specific one. In .both cases, it is sufficient to display a single frequency and angle interval, and it is therefore unnecessary to transfer the entire hyperimage. This decreases the transfers and storage space required. The size of the intervals can even be varied according to the memory space available on the graphic display terminal and the acceptable transfer rate. In addition, during display of an interval, other intervals can be stored. When the next interval is selected, either a stored interval is displayed or recalculation of the entire hyperimage is initiated.
Remark: This method removes the constraint on the number of frequencies and angles, since only limited intervals are stored on the graphic display terminal. l Once the user has viewed a number of (2; y) images, he may wish to select privileged areas on
500
the screen (body of the object, nose, tail, wings, etc.) by a set of windows. In this case, instead of transferring and calculating the complete hyperimage, the computer calculates and sends only the points in these windows (for a frequency and angle interval), which decreases the computation times and provides additional gains for transfer time and storage space. IV.6.2 - Consequences We can attempt to calculate the potential gains. Let us consider the case of 1002 (z, y) points and 1O242 frequency and angle points, i.e. 80 gigabytes to be transferred and stored in the case of the current programme. If the values to be displayed are in bytes (instead of on 64 bits), only 10 gigabytes are required for the complete hype&age. Now let us apply the reductions mentioned above. Let us set the frequency/angle intervals to 10 x 10 squares. This amounts to 100’ (5, y) points times 100, i.e. 1 megabyte (and therefore a transfer time of 0.5 seconds for a data rate of 2 Mbytes/s). It is recalled that a user materially requires nearly an hour to observe this quantity of data image by image. For fast animation, several consecutive intervals can be stored on current graphic display terminals during display of an interval. Now let us assume that our user selects a collection of small windows covering the totality of the illuminated object. It can be estimated that this will divide the number of points to be calculated, transferred and stored at least by 2, i.e. 0.5 megabytes. By applying these choices, it can be seen that regardless of the size of the radar hologram (number of observation frequencies and angles), for an image with 1002 (z, y) points, it is possible to transfer and store only small packets less than or equal to 0.5 megabytes on the graphic display terminal without losing any of the richness of the information calculated or the sophisticated processing. It can also be concluded that it is possible to increase the (2, y) size from both the standpoint of parallel calculation (points calculated and sent one by one) and from the standpoint of the display. A major consequence is that very large cases can already be processed, at least from the standpoint of display. However, for the computation time alone, it is recalled that 6 seconds are required by 128 processors to calculate a hypetimage with 1500 points x 128 frequencies x 64 angles. The computation time increases almost linearly with the number of points, angles and frequencies. Therefore, for the case of 10 gigabytes, the calculation time is multiplied by a factor of more than 700, amounting to more than one hour of calculation! If the complete image is explored in animation mode (10 images per second), more than 27 hours will be required to read it! It is clear that other
M. Roisin, A. Cosnuau
result reading simplifications or techniques need to be found, such as display distributed on different sites. V - CONCLUSION Because the code is independent of the target architecture, it could be ported to other types of parallel computers. In addition, since the increase in the number of processors does not cause any loss of efficiency, execution could be accelerated even more by constructing a parallel machine dedicated to this type of application. With this in mind, logarithmic, exponential, sine, cosine, square root and division functions that are much faster than those currently available should be associated with the performing FFTs. Care should also be taken not to jeopardise performance by slow external transfers and inefficient graphic display processing. A graphic processor integrated in the computation processor or a parallel graphic machine could be considered for post-processing. The performance achieved by the HYPIM code on the iPSC shows that radar imaging by wavelet analysis is now possible, operational and performing and therefore very competitive. In particular, it can be noted that for a hologram of 128 frequencies and 64 angles, 128 points are calculated simultaneously by 128 processors in 0.5 seconds. Forthcoming versions will include optimised trigonometric functions and will be ported to the Intel Paragon computer. The rapid construction of a hyperimage in a very short time by parallelising the algorithm, naturally adapted to the new architectures, on a multiprocessor machine, makes it possible to develop an interactive interface. This means that the radar imaging method by wavelet analysis developed by J. Bertrand, P. Bertrand and J. P. Ovarlez can be used very efficiently by coupling it with parallel computation. It becomes a tool allowing target design assisted by radar imaging. REFERENCES [l] Mensa D. L. - High Resolution Radar Imaging, Artech House, 1981. [2] Pouit C. - Caracterisation des Cchos radar sur les aeronefs, La Recherche Ae’rospatiale, mars-avril 1980, 131-139. [3] Altes R. A. - Proportional Bandwidth, Wideband Wigner-Ville Analysis, SPIE, Advanced Algorithms and Architectures for Signal Processing IV, 1989, 1152, 265-276. [4] Altes R. A. - The Fourier-Mellin Transform and Mammalian Hearing, J. Acoust. Sot. Am., 63 (l), Jan. 1978, 174-181. [5] Tuteur F. B. - Wavelet Transformations in Signal Detection, Proc. IEEE, 1988, 14351438. Aerospace
Science and Technology
Role of Parallelism
in Wavelet Analysis of Radar Imaging/ de l’imagevie radar par ondelettes
R81e du paralle’lisme dans l’exploitation
[6] Heil C. E., Walnut D. F. - Continuous and Discrete Wavelet Transforms, SIAM J. Math., Dec. 1989, 31, 628-666.
[7] Rioul O., Veterli M. - Wavelets and Signal Processing, IEEE SP Magazine, October 1991, 14-37. [8] Meyer Y., Jaffard S., Rioul 0. - L’analyse par ondelettes, Pour la Science, Sept. 1987, 119, 28-37. [9] Antoine J.-P., Carette P., Murenzi R., Piette B. - Image Analysis with Two-Dimensional Continuous Wavelet Transform Signal Processing, 1983, 31, 241-272. [lo] Bertrand J., Bertrand P, Ovarlez J. P. - Dimensionalized Wavelet Transform with Application to Radar Imaging, Pvoc. IEEE-ICASSP, 1990. [ 1 l] Bertrand J., Bertrand P., Ovarlez J. P. - Discrete Mellin Transform for Signal Analysis, Proc. IEEE-ICASSP, 1990, 1603-1606. 1121Ovarlez J.-P. - La Transformation de Mellin ; Un outil pour l’analyse de signaux B large bande, These de Doctouat, Universite de Paris-VI, avril 1992. [13] Bertrand J., Bertrand P., Ovarlez J. P. - The Wavelet Approach in Radar Imaging and its Physical Interpretation, Proceedings of the International Conference
1997,
no 7
“Wavelets and Applications”,
501
Toulouse, juin
1992,
653-658. u41 Bertrand J., Bertrand P., Ovarlez J. P. - Frequency
Directivity Scanning in Laboratory Radar Imaging, International Journal of Imaging Systemsand Technology, 1994, 5, 39-51. u51 Adam J. A. - How to Design an Invisible Aircraft, IEEE Spectrum, April 1988. U61 Fowler M. L., Sibul L. H. - Signal Detection Using Group Transforms, CH2977-7/9 l/0000- 1693, IEEE, 1991. u71 Saad Y., Schultz M. H. - Topological Properties
of Hypercubes, Technical Report YALEU/DCS/RR-389, Yale University, 1985.
Acknowledgements. - The authors wish to thank Mr G. Bobillot of the Systems Department and Mr P. Leca of the Information Systems Department for their interest in this work and the cooperation they made possible between their departments.