ncs MpLrrERS SIMULATION
ELSEVIER
Mathematics and Computers in Simulation 42 (1996) 97-105
Parallel architecture of adaptive MTD processors ~t Christo Kabakchiev, Vera Behar* CICT Bulgarian Academy of Sciences, Acad. G. Bontchev Str. 25-A, Sofia, Bulgaria
Abstract
A parallel systolic structure of the adaptive moving target detector (AMTD) is described. The adaptive signal processing is realized in the whole area of observation by a set of systolic processors operating in parallel. The cost of systolic AMTD implementation (necessary number of processing elements and computational steps) is also evaluated.
1. I n t r o d u c t i o n
In 1978 Kung and Leiserson [1] introduced the term "systolic array". They proposed the systolic arrays for applications with two important sets of characteristics. First, these applications require extensive throughput and large processing bandwidth. Second, these applications are supported by algorithms that can be implemented on arrays consisting of a few types of simple processing elements. These algorithms are characterized by repeated computations of few types of relatively simple operations that are common to many input data items. For this reason the algorithms intended for digital signal processing are very convenient for implementation on systolic arrays. In particular in the present paper we represent and evaluate one optimal variant of systolic architecture of digital signal processors intended for moving target detection (AMTD processors). The Moving Target Detector (MTD) is a pulsed Doppler radar using the Doppler effect to select signals from targets with different radial velocities. We consider the latest adaptive version of MTD processor (AMTD processor) developed at the Lincoln Laboratory in 1985. This version is described in [3] where close attention is paid to the adaptive filter design and the choice of sufficient number of bits to represent the filter coefficients. In the present paper we solve another very difficult problem of AMTD implementation concerning the choice of an optimal computer architecture. The AMTD processor involves a bank of Doppler filters with adaptive weights. The general block diagram of this processor is shown in Fig. 1. Each filter of the bank is uniformly spaced in a frequency region equal to pulse repetition frequency and tuned to different Doppler frequency. To reduce the processing time all filter weights can be computed for * Supported in part by Grant TN No. 245/92 from the Bulgarian National Foundation for Scientific Investigations and developed at the Laboratory "Signal" - Bulgarian Academy of Sciences. * Corresponding author. Address: Inst. of Information Technologies, Acad. G. Bonchev Str., bl.2, Sofia, Bulgaria. 0378-4754/96/$15.00 © 1996 Elsevier Science B.V. All rights reserved PII S0378-4754(96)0005 8-4
98
C. Kabakchiev, V. Behar/Mathematics and Computers in Simulation 42 (1996) 97-105 Video from receiver
I Tapped Delay Line F
Filter Estimation
-1
Clutter Level 'l Estimation
l
Weights
b
I
1,
tunedto fo
1
Square
Law Detector
1
[ ThAdsahiti~iegI
1,
Computer
Memory
Filter
1
[ Square Law Detector
1
Thresholding[ tomemory computer Adaptive
Fig. 1. Block diagram of AMTD processor. several types of clutter environment and can be stored in computer memory in advance. The selection of filter weights is controlled by "range-azimuth" clutter map. Decision of target-present and target-absent is realized by adaptive signal threshold. Radar Signal Parameter Estimation can be formulated as a problem in statistical decision theory. The class of possible radar signals (echoes) can be represented as points S in signal space. Each point in the space represents a waveform with a particular combination of radar signal parameters (Doppler shift, delay time, amplitude) according to a particular combination of target parameters (distance, velocity, azimuth). In a similar way noise space can be defined whose points n describe all possible waveform realizations of the noise process within the observation interval. Next, observation space is defined whose points X represent joint combination of signal S and noise n. Subsets of points from the observation space are mapped by a decision rule into signal points in decision space. The structure of signal processing applied to the waveform Xdf is shown in Fig. 2. The decision Ddf n indicates the presence or absence of desired signal Sdf n in the observed waveform Xdf after passing through the filter tuned to frequency fn. All points in the decision space, i.e. Ddfn, have only two possible values (0 : signal present and 1 : signal absent).
2. Systolic structure of A M T D processors The most effective approach for speeding the process of moving target detection in a large radar observation space is to design a signal processor with maximum parallelism. According to this approach the whole radar observation space can be divided into D resolution cells in range, into F resolution cells in azimuth and into Nf resolution cells in radial velocity. As a result the whole radar observation space can be represented as a set of (D × F x Nf) "range-azimuth-Doppler velocity" resolution cells where
C. Kabakchiev, V. Behar/Mathematics and Computers in Simulation 42 (1996) 97-105 I-
--
I
I I •
-
I L
I
of Fnters
I--
--]
II
II
I
II
II
I
II Bank
I
~
Square Law
[_ Detector
II J
99
I CFAR
IThreshold I
Fig. 2. Signal processing in Adf resolution cell. the signal processing in a single resulution cell Adfk (d = I - D ; f = I - F ; k = 1-Nf) is realized by a single processor SPdfn. Therefore the whole radar observation space can be performed by a set of identical processors operating in parallel. The number of such processors Nsp is given by the following expression:
Nse = D x F x Nf.
(1)
The processor S Pdfn consists of two consecutively connected subprocessors S PdDfnand S pyyAR: the first for signal filtration; the second for signal detection and parameter estimation (see Fig. 3). In this case the processing time in the whole radar observation space T °bs is computed by T ° b s : T~f n q- Td~Fn AR,
(2)
where T~f n is the number of computational steps in the subprocessor SP~f n and Td~FAR - in the subprocessor
sPdCfFAR. In this case the running time of the processor SPdf. can be optimized through the design of the optimal parallel architecture of both subprocessor SP~f n and subprocessor SPdCfFAR. 2.1. Systolic structure of the processor s e~f n According to [2,3] a sampled complex amplitude of the IF (Intermediate Frequency) signal observed at the input of the filter bank can be described by vector xTf = (Xdfl . . . . . Xdfi . . . . . XdfNp) where the vector component Xdfi can be represented as xdfi = xdf(iT) = adi( + jadif ( T : interpulse period; Np: number of coherent pulses; ai 1, ai2 : real and imaginary components of a complex Xdfi). ~I +M-E-M- ~- W~[
I
I MEM-Q
I I
Fig. 3. Structureof the processor SPdfn.
100
C Kabakchiev, V Behar/Mathematics and Computers in Simulation 42 (1996) 97-105
The signal power at the output of the processor Sfff n is given by Zdfn = Ydfn Y]dn' where Ydfn is the output signal of the nth Doppler filter. It is given by Ydfn = Xdf WT, where the filter coefficients (weights) Wn of the nth Doppler filter can be written as Wn = (Wn 1. . . . . toni . . . . . WnNp), toni : ton (i T) : Wnil + jWni2.
According to [3] all filter coefficients can be computed in advance for several (for example three) types of clutter environment: clutter absent or weak clutter or strong clutter. It is assumed that three variants of the vector Wn are computed in advance and stored in computer memory. In result the equation for Ydfn gives
ailWil -- ~ ai2wi2
Ydfn = \i=1
+j
ailtOi2 "~- Z
i=1
ai2Wil
"
i=1
Denoting up
E ailWil i=1
UP :
RI1 ,
E a i 2 w i 2 : R22, i=1 Np
Np
E a i l t O i 2 ..~ e l 2 ,
E a i 2 t O i l : R21 ,
i=1
i=1
it can be rewritten as Ydfn = ( R l l -- R22) + j(R12 + R21). Substituting this expression into equation for Zdfn we obtain the expression for Zdfn: Zdfn = ( R l l - R22) 2 + (R12 + R21) 2.
The systolic linear array of the processor Sfff n intended for signal filtration in a single "range-azimuthDoppler velocity" resolution cell is shown in Fig. 4. It involves four types of processing elements with very simple logic presented in Fig. 4. One of the processing elements is well known with inside accumulation. However we must mention that this structure requires every input data item to be doubled. The cost of this systolic implementation is given by the following expressions: The number of processing elements is computed by (3)
Nfff n = 7
The number of computational steps is computed by
T~f n = Np + 3.
(4)
Another version of the systolic array of the processor SPry n is shown in Fig. 5. As shown in Fig. 5 this structure is a combination of rectangular and linear arrays and involves five types of processing elements with simple logic. The cost of this systolic implementation is given by the following expressions: The number of processing elements is computed by
Nfff n = 4
x Np + 5 .
(5)
The number of computational steps is computed by T~f n = g p + 4.
(6)
C. Kabakchiev, V Behar/Mathematics and Computers in Simulation 42 (1996) 97-105
101
all ... alNp W l l . . . WlNp a21. • • a2Np W 2 1 . . . W2Np
I PE2 I [
a l l • • •alNp
A
I J ,
1
Zdth
I -I
W 2 1 . . . W2Np a21. •. a2Np W l l •. • WiNp
ain
~
~
Zout
Win
nain bi
I ~
Cin di n
1) -(out
PE8
ain
D~'-~
bin
.[VI~4 I
-.in
-,- ~ , ~
D aout
a~u)t - I (O -(I-I) nIi. - I ) _ -- U|n
D bout
b (I) k(I-1) a ~(I-1) out ~-- Uin 7- bin
.(1)
bi n
(1-1) ~ .(1-1) . (1-1) =
bout
~(1-1)
b (I) - t'(l-1) I"(I-1) out -- ~in " ~in
~(I)
p Cout
_ ^0-1)
• ~in
_ _(I-1)
~out-Ctin
(1-1)
+bin
Fig. 4. Systolic array of the processor spaDf.
2.2. Systolic structure o f the processor ~'°CFAR ~'" dfn
It is well known that Constant False Alarm Rate (CFAR) processors are used for detecting target echoes in background of clutters with an unknown intensity. According to basic concept the automatic target detection is commonly implemented by comparing the voltage in each resolution cell to an adaptive threshold determined on the base of the noise power estimate over adjacent range and/or Doppler resolution cells [2,4]. The decision rule for testing two hypotheses HI (Ddfn = 1 : target present) and Ho (Ddfn = 0 : target absent) in the Actfrith resolution cell is given by
Ddfn =
1 0
ifZdfn >>.T x P, ifZdfn < T x P,
where Zdfn is the voltage in the cell under test, P is the noise power estimate, T is the detection scale factor obtained from the following equation:
102
C. Kabakchiev, V. Behar /Mathematics and Computers in Simulation 42 (1996) 97-105
~all
~ a12
~alNp ( N p -- 1 ) steps
0
It
Zaf~
o (Np - 1) steps ~a2Np
bin
~i
ain
t tout
°l°'~' = a~'~mt
°'..... '° t.(l-1) ~ _
~(l-1)
~tln t
Fig. 5. Systolic array of the processor SP~fn.
fa=f f
: H o ) f ( P ) d(Zdfn) dP,
0 PT
where f ( Z d f n : H0) is the noise pdf in the cell under test; f ( p ) is the pdf of the noise power estimate ; Pfa is the false alarm probability to be maintained. The OS CFAR processor proposed by Rohling [4] estimates the noise power simply by selecting the kth largest cell in the reference window of size N. The OS CFAR processor suffers only minor degradation in detection probability (in exponential homogeneous noise background) and resolves closely spaced targets effectively for k tended to the maximum. The general block diagram of this processor is shown in Fig. 6. The structure of OS CFAR processor involves a sort procedure. As a result from the sort procedure all voltages in the reference window are ranked according to increasing magnitudes: YI <. Y 2 . . . <. Y k . . . <. YN,
where N is the size of the reference window. In this case the estimate P is formed as P ---- Yk. The main application difficulty of the OS CFAR implementation is a time-consuming sorting procedure. There is a large set of relatively fast sorting algorithms, namely: HeapSort, QuickSort, Odd-Even Transposition Sort, Bucket Sort, Counting Sort, Odd-Even Merging, etc. [5]. Most of these algorithms can be realized as parallel over systolic architecture. However, we think the more effective solution of this problem is to develop special adaptations of sorting algorithms using all prior information for the input sequence to be sorted and OS CFAR parameters. Such information can contain the following data: size of a reference window (N), order of an element to be selected (k), false alarm probability to be maintained (Pfa). Rohling showed all practical values of k and N are usually chosen from the following condition:
C. Kabakchiev,
V. Behar/Mathematics and Computers in Simulation 42
(1996) 97-105
103
, Range Tapped Delay Line
I Y-,
.......
I,L]I I I
ILII 1
SOR~ andSELECT k-th reference cell Estimated @p clutter power TOS p
Detector scale factor
S Threshold
SmTxP
"l~arget '1 Comparator ~_.~DeclsionDdfn
Fig. 6. Block diagram of OS CFAR processor.
k/> 43-N, where N/> 24.
(7)
Using this condition we proposed and described in [6] the following new practical algorithm of sorting. It has the following computational steps: (1) Split the input vector Y into "M" subvectors of size L meeting the requirements L = N/M
and
L >~ N - k
(8)
q-1.
(2) Sort in parallel all subvectors according to decreasing magnitudes. (3) Form new subvectors Z from the largest (N - k + 1) elements of the vectors Y. (4) Merge subvectors into vector P. (5) Sort the vector P according to decreasing magnitudes. (6) Evaluate the estimate P. We realized this algorithm as a sorting network on the base of the Odd-Even Transposition Sort method convenient for systolic implementation. The general block diagram of OS CFAR processor realized on the base of this M-splitting procedure is shown in Fig. 7. At the first stage of sorting all voltages in the reference window are separated into M vectors Y1. . . . . YM. The number of elements in each vector must be greater than (N - k + 1) (Eq. (8)). After that the vectors YI . . . . . 1"M are sorted according to decreasing magnitudes by processors S P CFAR . . . . . SP CFAR operating in parallel. The first (N - k + 1) elements of sorted vectors form the vectors Z1, Z2 . . . . . ZM. At the second
s CFAR i
•
PN-k+I~ p,~c,,-k+d ze.
Fig. 7. Structure of the processor SPdCfFAR.
Da,,.
104
C. Kabakchiev, V Behar/Mathematics and Computers in Simulation 42 (1996) 97-105
stage of sorting the vectors Z l , Z 2 . . . . . Z M are merged in a new vector which is sorted in decreasing order by the processor ~'" qloCFAR M+I • The result from sorting is the vector P, i.e. P1 /> P2 /> • " /> P N - k + I >1 "'" >>PM(N-k+I). The power estimate P is assumed to be the value P N - k + I . The structure of the processors spCFAR . . . . . . . gioCFAR M+I is a systolic network designed on the base of the Odd-Even Transposition Sort Method implementation. This systolic an-ay is shown in Fig. 8. The logic of processing elements PE1 and PE2 is shown in Fig. 9 and described by the following expressions: {X~ n x~ut=
ifX~n~
xi9n
z~ut = { 1 0
otherwise.
{X~ n x~ut=
xi9n
ifX~n > X ~ n, otherwise.
if Z~n ) Z~n. Z~n, otherwise.
The number of processing elements PE1 necessary only for sorting can be evaluated by Np(M) E1 =
N(N/M 2
- 1)
M ( N - k + 1)[M(N - k + 1) - 1] +
2
(9)
The number of computational steps necessary only for sorting is evaluated by N(TM) = N / M + M ( N - k + 1).
(10)
The cost of systolic implementation of a sort network evaluated for several values of M, two types of a reference window is shown in Table 1. Analysis of results shows that optimum splitting of the reference window into subwindows can sufficiently reduce (up to 60%) the number of processing elements and minimize the running time in the systolic array of the OS CFAR processor (for example M = 4 for two given samples). Varying the basic OS CFAR parameters (N, M and K) we can design the optimum systolic architecture of the OS CFAR processor-SPdCfFnAR.
~a
Yi
;[ P E I
"'"
.
.
~ z,,
.
I/L--h
Fig. 8. Systolic structure of the p r o c e s s o r s
xp
xl-~.1-~
." xt~'x¢.,
zz ~
SP1cFAR,
.z~-
•
' ' ' ~ "
~,ioCFAR M+I
. zr"
Fig. 9. Structure of processing elements PE1 and PE2.
"
C. Kabakchiev, V. Behar/Mathematics and Computers in Simulation 42 (1996) 97-105
105
Table 1 Cost of systolic implementation of a sort network Sample 1: N=24, k=22 Number of PE1 (N~M~) Number of steps (N~,,)'m)
Sample 2: N=32, k=30
M=I
M=2
M=4
M=I
M=2
M=4
M=8
276 24
147 18
126 18
496 32
255 22
178 20
324 28
3. Conclusions In conclusion the cost effectiveness of systolic array of AMTD processor can be evaluated as for a single "range-azimuth-velocity" resolution cell as for the whole radar observation space. The number of processing elements in the systolic array of AMTD processor necessary for signal processing: - in the A d f n "range-azimuth-Doppler velocity" resolution cell is computed by
M(M)
Ndfn =
4Np + 6 + ~'PE1, 8 + ~(M),,PE1.
(1 1)
-- in the whole radar observation space is computed by Nobs = Nsp Ndfn.
(12)
The number of processors Nsp is found from Eq. (1). In the end the number of computational steps in AMTD processor necessary for signal processing in the whole radar observation space T °bs can be computed as T °bs =
rd: = [ Np + 5 +
/ Np+a+s
M),
M>
(13)
Two variants of Ndf n and T °bs are computed for two systolic arrays of the processor SPfff n according to Eqs. (3)-(6). We think that varying all basic AMTD parameters (number of coherent pulses Np, size of the reference window N, number of subwindows M, order of selected magnitude in the reference window K, etc.) we can find compromise between the cost effectiveness of systolic implementation of AMTD processor and the necessary quality of moving target detection. In conclusion it must be noted that the so-obtained systolic architectures of AMTD processor are very convenient for VLSI technology.
References [1] H. Kung and C. Leiserson, Sparse Matrix Proc. (Academic Press, Orlando, F1, 1978) 256-282. [2] D. Barton, Modem Radar System Analysis (Artech House, 1988) 255-260. [3] E. D'Addio and G. Galati, lEE Proc. 132 (1) (1985) 58-65. [4] H. Rohling, IEEE Trans. AES-19 (4) (1983) 608-621. [5] G. Selim, Parallel Sorting Algorithms (Academic Press, Canada, 1985) 41-47. [6] V. Behar and Chr. Kabakchiev, Proc. ECCTD'93 (Davos, Switzerland, 1993) 981-984.