VLSI architectures for CFAR detectors based on order statistics

VLSI architectures for CFAR detectors based on order statistics

SIGNAL PROCESSING ELSEVIER Signal Processing 62 ( 1997) 73-86 VLSI architectures for CFAR detectors based on order statistics Dong-Seog Han* Scl~o...

1MB Sizes 0 Downloads 69 Views

SIGNAL

PROCESSING ELSEVIER

Signal Processing

62 ( 1997) 73-86

VLSI architectures for CFAR detectors based on order statistics Dong-Seog Han* Scl~ool oj Electronic

mu

Ebctricul

Engineering,

Kyunypook

Nutional

South

Received

Unirvwity,

1370 Sankyuk-Donq,

Puk-Gu.

Taqu

702-701,

Korru

13 May 1996; revised 27 January

I997 and 12 May 1997

Abstract In this paper, we propose new VLSI architectures for the order statistics (OS) constant false alarm rate (CFAR) detector and the modified OS-CFAR detectors such as the order statistics greatest of (OSGO) and the order statistics smallest of (OSSO) CFAR detectors. By transforming the OS-CFAR detection algorithm to a recursive algorithm, we derive efficient systolic array architectures to achieve good VLSI architectures. All the proposed architectures have several simple processing elements (PEs), a few communication links between adjacent PEs, and a high throughput rate suitable for real time processing. 0 1997 Elsevier Science B.V. Zusammenfassung In diesem Artikel schlagen wir neue VLSI Architekturen zur Realisierung von ordnungsstatistischen (OS) Detektoren konstanter Falschalarmwahrscheinlichkeit (CFAR) und von modifizierten OS-CFAR Detektoren vor, wie z.B. die ordnungsstatistischen Detektoren griiljter (OSGO) oder kleinster (OSSO) konstanter Falschalarmwahrscheinlichkeit. Durch die LJberfihrung des OS-CFAR Detektionsalgorithmus in eine rekursive Darstellung leiten wir effiziente Realisierungen in Form von systolischen Rechenfeldem ab, die sich gut als VLSI-Architektur umsetzen lassen. Samtliche vorgeschlagene Architekturen sind aus zahlreichen einfachen Recheneinheiten (PEs) und einigen Datenverbindungen zwischen benachbarten PEs zusammengesetzt. Dabei wird eine hohe Datentransferrate verwendet, urn eine gute Eignung fur Echtzeitanwendungen sicherzustellen. 0 1997 Elsevier Science B.V. R&urn& Nous proposons dans cet article des architectures VLSI nouvelles pour le detecteur a taux de fausse alarme constant (CFAR) base sur les statistiques d’ordre (OS) et les detecteurs OS-CFAR modifies tels que les detecteurs OS-CFAR “le plus grand” (OSGO) et “le plus petit” (OSSO). Par transformation de l’algorithme de detection OS CFAR en un algorithme recursif, nous derivons des architectures de reseau systolique efficientes afin d’obtenir de bonnes architectures VLSI. Toutes les architectures proposees ont plusieurs elements de traitement (PE) simples, des peu nombreux liens de communication peu nombreux entre PE adjacents, et un haut debit en sortie permettant le traitement en temps reel. 0 1997 Elsevier Science B.V. Ke~~~rrls:

Radar; Constant false alarm rate; Order statistics; Systolic array

* Tel.: +82 53 950 6609; fax: +82 53 950 5505; e-mail: [email protected]. 0165.1684/97~$17.00 PIISOl65-1684(97)001

@ 1997 Elsevier Science B.V. All rights reserved. 16-3

74

D.-S. Han I Signal Processing

1. introduction Constant false alarm rate (CFAR) detection of a target is obtained by a digital signal processing algorithm that provides detection thresholds in automatic detection systems. CFAR detectors are most commonly employed in modem automatic detection, tracking and pulse Doppler radar systems [lo]. In radar systems, a CFAR detector is used to decide the presence of a target from a radar resolution cell. The purpose of CFAR design is maximization of detection probability while maintaining a desired false alarm rate. A CFAR detector should provide detection thresholds that are relatively immune to background noise and clutter variation with a constant false alarm rate [9, lo]. The most conventional CFAR schemes are mean level type CFAR detectors such as the cell averaging (CA), the greatest of (GO), and the smallest of (SO) CFAR detectors [3,13,14]. The CA-CFAR detector has maximum detectability if the reference cells are independent and identically distributed (i.i.d.) with exponential distribution. Its detection performance, however, degrades considerably in regions of clutter edges and closely spaced multiple target environments [2, 121. Rohling [ 121 has proposed the order statistics (OS) CFAR detector which takes an appropriate reference cell to estimate background clutter power level. The OS-CFAR detector has small additional detection loss over the CA-CFAR detector in uniform noise backgrounds and can resolve closely spaced targets. However, it requires a longer processing time than the CA CFAR detector. Elias-fuste et al. [l] have proposed two new modified OS-CFAR detectors that require less processing time than the OS-CFAR detector. One is the order statistics greatest of (OSGO) CFAR detector, the other is the order statistics smallest of (OSSO) CFAR detector. The OSGO-CFAR detector has the advantage of the OS-CFAR detector and reveals some superiority over the OS-CFAR detector in clutter power transition regions. The OSSOCFAR detector, however, performs a little worse compared to the OS-CFAR detector in any radar environment. The possibility of real-time processing of an algorithm is the most important problem in radar systems. If an algorithm cannot be processed in real-time

62 (1997)

73-86

regardless of good performance, it will be useless to the systems in which real-time processing is required urgently, such as radar systems. Since the data rate of most search radars is at least 20MHz [4], CFAR detectors based on order statistics, such as the OSand the OSGO-CFAR detectors, can hardly be implemented with a conventional digital signal processor (DSP). CFAR processing is just one of the regular and compute-bound computations, that is, repetitive computations are performed on a large set of data. Therefore, we have to utilize parallel processing architectures. Hence, the implementation of CFAR detectors based on order statistics in real-time is possible by using highly parallel processing and exploiting VLSI technology. Multiple DSP architectures are also possible but these are not suitable for CFAR detectors rather than specific VLSIs from a cost point of view. A VLSI architecture based on such a consideration is systolic arrays. A systolic system is a network of processors that rhythmically compute and pass data through the system. Every processor regularly pumps data in and out, each time performing some short computation, in order that a regular flow of data is kept up in the network. The criteria for the design of systolic arrays are as follows. (1) The design makes multiple use of each input data item. (2) The design uses extensive concurrency. (3) There are only a few types of simple cells. (4) Data and control flows are simple and regular. Systolic designs based on these criteria are simple (a consequence of properties 3 and 4), modular and expandable (property 4), and yield high performance (properties 1, 2 and 4). They therefore meet the architectural challenges for special-purpose systems [6]. Ritcey and Hwang [ 1 l] have proposed systolic architectures for the OS-CFAR and the OSGO-CFAR detectors [4,5]. But the systolic arrays proposed by Ritcey and Hwang have the pipelining period of 2. Therefore, the possible maximum throughput rate is reduced to half the system clock rate. Since the reducing of the throughput rate may impair real-time processing, it is required to design systolic architectures for various CFAR detectors whose throughput rate is higher than that of the Ritcey and Hwang’s architecture. When a large number of processors work together, communication becomes significant. In VLSI technology, routing costs

D.-S. HanISignal

Processing 62 (1997)

dominate the power, time and area required to implement a computation. Therefore, local communication in systolic arrays is advantageous [7]. In this paper, we propose new VLSI architectures for the OS, the OSGO and the OSSO-CFAR detectors having improved throughput rate, reduced PEs and reduced interconnections between two PEs over the Ritcey and Hwang’s architectures. The paper is organized as follows. In Section 2, we describe the operation of the OS-CFAR detector and briefly review the Ritcey and Hwang’s systolic architecture as a preliminary study. Then, we propose an efficient VLSI architecture for the OS-CFAR detector according to the design strategies in Section 3.1. Performance and hardware complexity comparisons between the proposed and the Ritcey and Hwang’s architectures are given in Section 3.2. Section 4 proposes systolic architectures for the OSGO and the OSSO-CFAR detectors by slightly modifying the proposed systolic architecture for the OS-CFAR detector. Finally, Section 5 contains the concluding remarks.

2. OS-CFAR detector and Ritcey and Hwang’s

7346

15

the background noise power level Z is estimated by the kth smallest cell among N reference cells. That is, X(i) Z’Z’i,where Xi and T are the power of a test sample and a constant scale factor, respectively, the detector decides the presence of a target. In other words, the detector decides the presence of a target, if X/ =Xi/T>/Xtk). Hence, by just counting the number of reference cells less than or equal to X/, pi, one can implement the OSCFAR detector without sorting the reference window {xi_,, . . . ,Xi_I,Xi+1,. . . ,Xi+n}. That is, the detector decides the presence of a target if pi 2 k. So the block diagram of the OS-CFAR detector can be slightly modified as Fig. 2. Hence, the processing of the OS-CFAR detector can be defined as follows: Given a constant threshold coefficient T and a decision value k, and the input sequence {...,xi-,,

. . . . X-l,& -X+1> ...> Xi+n,...), result sequence {~~~~Pi~PLi+I~~~~} defined by pi = the number of cells less than or equal to Xi/T among the reference window

compute the

{XI-n,...,Xi-~,Xi+~r...,Xl+n},

and

architecture

decide either the presence of a target if pi 2 k or no

The square-law detected video samples are sent serially into a shift register of length N(= 2n) + 1 as shown in Fig. 1. In the OS-CFAR detector,

target. Consider a sequence of data samples and a reference window of size 2N. As the window slides into the next one, all the data samples except the departing one

Y TESTCEU

1

Fig. I. Block diagram

of the OS-CFAR

detector.

D.-S. Han ISignal Processing 62 (1997)

76

73-86

COUNT THENUMBEROF REARENCE CELL5LESSMANOREQUMTO

Fig. 2. Implementation

concept for the OS-CFAR

. . .

*

CM

X;,_L

PEn

31,

fxi+% ~Xi.+3.~::::

I

XI-n_ Pi-n

* PEz

:t

/xi-n_ *

W-n-l

DELAYPE

]

,X; Xi-n-l,

(4

-0’

X;

d

. . .

Pi

_

I

detector.

Ph

Inpltsa#atcdbuUd

;k;

DEasloN

WnPE

MADPE

Fig. 3. (a) Ritcey and Hwang’s element).

systolic

architecture

for the OS-CFAR

detector.

(b) Functional

descriptions

of PEs (D denotes a delay

D.-S. Han ISignal Processing 62 (1997) 73-86

appear again in the next window. Denote the present window by K = {Xi_,, . . ,Xi_1 ,Xi,Xi+l,. . +I$+,} while the window II$+j, j < n, is {X;_,+j, . . . ,X;_ 1+j, X;+j, X;+l +J, . . . , X;+n+j}. It should be noted that, when the comparison between X/ =X;/T and Xi+j is performed in the window II$, the comparison between Xi:i =X;+j/T and Xi in the window w.+j can also be performed. Hence, this method implies parallelism. Ritcey and Hwang have incorporated this method into their systolic design for the OS-CFAR detector as shown in Fig. 3. The Ritcey and Hwang’s architecture consists of n compare-and-accumulate (CAA) PEs, one delay PE, and one multiplicationand-decision (MAD) PE to implement the OS-CFAR detector with window size 2n + 1. A constant threshold constant T and a decision value k are preloaded to the MAD-PE, and stay at the cell throughout the computation. The test sample Xi enters the rightmost MAD-PE, and is multiplied by l/T, the product is assigned to X/. Both X; and X/ then enter the rightmost CAA-PE to initiate the processing of the window K. An accumulator p; (with zero initial value) is also created accompanied with X;. These three data propagate leftward. At the same time of leftward propagation, X/ consecutively compares with all the routed back n data samples arriving immediately before Xi, i.e., {X_-n,&-n+l,. . . ,&I}, in the n CAA-PEs. And pi then consecutively accumulates those binary comparison results. Specifically, Initialize pj = 0. Forj=n ,..., 1, step -1

When pi comes back to the rightmost MAD-PE, it is ready to be compared with the constant k to produce decision.

3. A new VLSI architecture for the OS-CFAR

detector 3.1. Approach For the Ritcey and Hwang’s architecture, it requires that the processing clock of the architecture should be doubled compared to the sampling clock of the input data for the comparisons between the test sample Xi and all the other samples in the window &. Double clocking generates a sequence X;, X;, X+1, X;+i, X;+2,. . . , which does not effect the result. Therefore, the throughput rate of the Ritcey and Hwang’s architecture is at most half the processing clock rate. To improve the throughput rate and to reduce the hardware complexity, we propose a new systolic architecture which does not require double clocking with reduced PEs. To achieve a good VLSI array architecture, we first derive a recursive equation for the OSCFAR detector. It forms an important step in developing parallel architectures. The recursive equation for the OS-CFAR detector with the window @= {_J&, ,&_I,&, . . . ,&+n} is #+I)

= cl;” + ci.0,

where j is the recursion (1) =o

Pi +

~, = I

C 847

1,

ifX,‘34-j,

2,

ifxil

x/

~= I

C

K-t19

if X/ >X;+j,

Pi>

otherwise.

,!j) I-

index, j = 1,2,. . , n + 1, and

9

PL,

otherwise.

After X, X/ and p; reach the leftmost delay PE, they are routed back and then X/ consecutively compares with all the n data samples arriving immediately after itself, i.e., {Xi+,, X;+2,. . . ,X;+n}. According to those comparison results, p; consecutively updates as follows: For ,j = 1,. . . , n, step 1

77

_

1

3

ifX

L

>X-2((n/2)-j+l)--I3 >,X,-2((n/2)-j+l)

4’

0,

and

a&-2((n/2)-j+l)

34--2((n/2)-j+l

or )-I

>

otherwise,

where we assume that X; = co to avoid comparison between X/ and X;, and also assume that Xk = cc) if Xk is not in the window lI$. The recursive equation with space-time indices uses one index for time and the other indices for space. By doing so, the activities of a parallel algorithm can be adequately expressed.

78

D.-S. Han I Signal Processing 62 (1997)

Fig. 4. DG and SFG for the OS-CFAR detector with the window to other neighbor PEs as shown in DG)

According to the above recursive equation, we can obtain a dependence graph (DG) that shows the dependence of the computation of a single assignment algorithm as shown in Fig. 4. To determine a valid array structure for a locally recursive algorithm, one straightforward design method is to designate one processing element for each node in a DG. However, this results in a very complicated hardware complexity with inefficient utilization of the PEs. In order to improve PE utilization, it is often desirable to map the nodes of the DG onto a fewer number of PEs. This can be achieved by deriving a signal flow graph (SFG) from the DG. The SFG is viewed as a simplified graph and a closer hardware level design. There are two steps to map a DG to an SFG array. The first is the processor assignment; the second step is the scheduling. To get the processor assignment, a projection method may usually be

73-86

size of 9 (some dashed outputs in detailed PEs may not be connected

applied, in which nodes of the DG along a straight line are assigned to a common PE. In the OS-CFAR detector, the 2-D index space may be decomposed into a 1-D processor space and 1-D delay space by projection of the DG. The delay space is related to the scheduling, which specifies the sequence of the operations in all the PEs. A linear schedule is based on a set of parallel and uniformly spaced hyperplanes in the DG. These hyperplanes are called equitempored hyperplanes, and all the nodes on the same hyperplane must be processed at the same time [7]. By a projection of the DG, we can obtain the SFG as shown in Fig. 4 for the OS-CFAR detector. The last stage is to convert the SFG to a systolic array design. The systolic array can be directly obtained by transforming an SFG to an equivalent and temporal localized form so that all the edges between modular sections have at least one delay element. This

D.-S. Han/Signal

Processing 62 (1997)

7346

19

r=nlZ Is+1

PEr

PEr.1

WLPE

pE2

2 xja

. . .

....

(4 r=(n+1)/2 f52 e

,,x;_l

. . .

.

Xi



I

Xi+1 Xi+!4 Xi+s . .. .

I

x2in

Fig. 5. (a) Proposed systolic architecture for the OS-CFAR detector when n is even. (b) Proposed detector when n is odd. (c) Functional descriptions of PEs.

procedure is called as the pipeline retiming. Fig. 4 shows that it does not require further retiming due to the inherent delay elements between PEs. The detailed systolic architectures are shown in Fig. 5. The proposed architecture has at most ten local communication links between adjacent PEs as shown in Fig. 5. While the Ritcey and Hwang’s architecture has twelve local communication links between adjacent PEs - six for the previous PE and six for the

rr7

.%,_x;

I 1

xlWt

.=O

-L

X. cl

x2&a

MPE

El a-1

xlOIJ

X3in

systolic architecture

of the OS-CFAR

following PE. The proposed architecture for the OSCFAR detector with reference window size 2n + 1 consists of one multiplication (MUL) PE, two compare-and-accumulate-type 1 (CIA1 ) PEs, n - 1 compare-and-accumulate-type 2 (CAA2) PEs, and one decision PE. A constant threshold co&icient T and a decision value k are preloaded to the MUL-PE and the decision PE, respectively, and stay at these cells throughout the computation.

D.-S. Han ISignal Processing

80

The test sample Xi enters the upper rightmost MUL-PE, and is multiplied by l/T the product is assigned to X/. Both data then enter PEl to initiate the processing of the window q. An accumulator pi (with zero initial value) is also created accompanied by Xi. Xi sequentially meets all the other data samples {X,_,, . . . J-1 ,$+I,. . . ,Xfn} in B$ as the three data propagate. When n = even, X/ meets the other data samples in K as the following sequence of data tuples: (Xi-n+l,Xi-n+2),...,(Xi-3,~-2),(Xi-l),

(xi-,),

(Xl+l,Xi+2),...,(Xi+n_l,Xi+n).

That is, Initialize

pi = 0.

pi+

1,

pi =

if X: >Xi-,, otherwise.

{ Iii, Forj=n-l,n-3

cLi=

,..., 3,step-2

pi + 2,

if X/ &Xi-j

and X/ >Xi_j+i,

/4+l,

ifX/>Xj-j

or

1,

pi =

Forj=1,3

1

,..., n-

l,step2

/Ji + 2,

ifX/ >Xj+j

and X: >Xi+j+i,

pi+19

if X/ >Xi+j

or X/ >Xi+j+i,

pi,

otherwise.

On the other hand, when n is odd, X/ meets the other data samples in q as the following sequence of data tuples:

That is, Initialize ~, = I

otherwise.

For j=2,4

pi=

,..., n, step 2

/4 + 2,

ifX/>Xi+j

and X/>Xi+j+i,

/4+1,

ifX[>Xj+j

or X/>Xj+j+i,

otherwise.

( pi,

Each CAA-PE compares X/ with the other data sample(s) propagated from neighbor PE(s) and modifies pi according to the result(s) of the comparison(s). When pi reaches the decision PE, pi contains the rank of X/ in Wi and is ready to be compared with the constant k to produce the decision. A snapshot is a description of the activities of an array system at a particular time instant. Snapshots are perhaps the most natural tool an algorithm designer can adopt to check or verify a new array algorithm. The snapshots in Fig. 6 depict the systolic processing of the OS-CFAR detector for the following input sequence S:

ifX/>Xi--n, otherwise.

For j = n - 1, n - 3,. . . ,2, step -2 if X/ >Xj-j

and X/ >Xi_j+i,

ifX/>Xi-j

OrX]!>Xi_j+i,

otherwise.

5

X2 8

X, 9

X, 7

X, 6

X, 7

Xl1

&2

&3

&4

55

%6..'

8

9

7

6

7

5...

X, 4

Xs x9 11135

xl0

The snapshots are obtained under the assumptions of windowsize2n+1=9,T=l andk=7. The proposed OS-CFAR processor has the pipelining period of 1. Therefore, the utilization of each PE is 1. So the processor utilization of the proposed architecture is improved by a factor of 2 in comparison with that of the Ritcey and Hwang’s architecture. The number of interconnections required to pass data from one PE to other PEs for the proposed architecture is less than that for the Ritcey and Hwang’s architecture. 3.2. Hardware complexity and throughput rate

/Ji = 0.

/Ji + 1, { Lb,

IPi,

s = X,

ifX/>Xi_i, otherwise.

C i4,

pi=

‘-

73.36

otherwise.

{ pit ,4+

X/>Xi_j+l,

62 (1997)

In this section, we show the superiority of the proposed architecture to the Ritcey and Hwang’s architecture from a VLSI implementation point of view. There are many factors in determining the optimal&y criteria for the design of systolic arrays. The final choice of optimality criteria will have to be application-dependent. We evaluate the optimality of the architecture with

D.-S. Hun ISignal Processing 62 (1997)

some typical factors, such as latency time, throughput rate, VLSI complexity in terms of gate count, and interconnections between PEs. They are explained below and we summarize the performance compari-

7346

81

son between the proposed, and Ritcey and Hwang’s architectures in Table 1. Latency is the time interval between loading the first input and unloading the last output of a

(4

(4

4

0

4

03

Fig. 6. Snapshots for the proposed OS-CFAR processor with the window size of 9: (a) initial stage; (b) after cycle 1; (c) after cycle 2; (d) after cycle 3; (e) after cycle 4; (f) after cycle 5; (g) after cycle 6; (h) after cycle 7.

D.-S. Han ISignal Processing 62 (1997)

82 Table 1 Performance

comparison

between the proposed

and Ritcey and Hwang’s

7346

architectures

Proposed

Ritcey and Hwang

Latency (cycles)

n+3

2n + 3

Throughput

rate

(Tmu~ + TD)-'HZ

(~(T,uI +

Number of PEs

2-input comparator: Delay element for Delay element for Delay element for 3-input adder: n 2-input adder: 2 Multiplier: 1

Total interconnections between PEs

2n + 1 X, and Xr’: 2n + 4 p,: n + 1 DECISION: 1 1

(n + l)[logz(Zn)l-bit (n + l&bit for X’ (3n + l)L-bit for X

2-input Delay Delay Delay 2-input

comparator: 2n + 1 element for Xi and X/: 4n + 2 element for pi: 2n + 1 element for DECISION: 1 adder: 2n

Multiplier: for p

problem instance into/from the processing array. Hence, the throughput rate which is in the actual case the inverse of the clock period is specified by the worst case delay between two D flips-flops. The most complex PEs, which determine the throughput rate, are the CAA-PE or the MAD-PE, and the C&42-PE or the MUL-PE for the Ritcey and Hwang’s, and proposed architectures, respectively. In the CAA-PE, there are two same blocks for comparison and addition operations. Both the blocks are operated in parallel. But the comparison and addition operations are operated serially. Suppose that T, and Td denotes the required times for the comparison operation and the addition operation with two inputs, respectively. As a result, the throughput rate for the CAA-PE can be as high as
To))-‘Hz

1

(2n + l)[log2(2n)l-bit (2n + l)L-bit for X’ (2n + 2)L-bit for X

for p

throughput rate of the proposed systolic architecture. Also in case of Ritcey and Hwang’s architecture the clock rate is specified by the multiplier. However, the throughput of Ritcey and Hwang’s architecture is half that of the proposed architecture because the effective sampling rate is half of the clock rate. So, the throughput rate for the proposed architecture is two times as fast as that for the Ritcey and Hwang’s architecture. From an implementation point of view, we can implement a L-bit delay element with 8L gates by the LSI logics 300K gate array technology [8]. We assume that all the parameters Xi and X/ are represented in L bits for the simplicity of the analysis. And the number of bits for pi is rlog,(2n)l, where 1x1 is the smallest integer greater than or equal to x. Hence, the number of additional gates to implement the Ritcey and Hwang’s architecture to the proposed architecture due to 2n - 2 additional delay elements for Xi and X/, and due to n additional delay elements for pi is about 16(n - 1)L + 8n [log2(2n)l. The operation of two input adders considered in this paper is just a [log2(2n)] -bit data plus one or zero. Hence, it can be implemented with halfadders of gate size 5rlog,(2n)] [8]. And the operation of three adders considered in this paper is just a [log2(2n)1 -bit data plus zero, one or two. Hence it can be implemented with a full-adder of gate size 8 for least significant bit and half adders for remaining bits. Therefore, the required gate size for a two-operand addition is about 8 + 5( [log,(2n)l - 1) [8]. So, we can implement the adders in the proposed architecture with

D.-S

HanlSignal

Fig. 7. Block diagram

Processing 62 (1997)

of the modified OS-CFAR

5(n $ l)[log,(2n)l + 3(n - 1) gates. On the other hand, the adders for the Ritcey and Hwang’s architecture require 10n [logz(2n)l gates. Hence, the additional gates to implement the Ritcey and Hwang’s architecture to the proposed architecture due to the adder is about 5(n - l)[log,(2n)l - 3(n - 1). As a result, the Ritcey and Hwang’s architecture requires about (13n - 5)[log,(2n)l + (n - 1)(16L - 3) more gates than the proposed architecture. The local communication links between adjacent PEs are also compared in Table 1.

4. VLSIs for the modified OS-CFAR detectors 4.1. Approach In this section, we propose systolic array architectures for the modified OS-CFAR detectors, i.e., the OSGO-CFAR and the OSSO-CFAR detectors. To design systolic array architectures for the modified OS-CFAR detectors, we can slightly modify the scheme of the CFAR detector shown in Fig. 7 as in Fig. 8. In the modified OS-CFAR detectors, the reference window is divided into the leading and lagging reference windows by the location of the reference

7346

83

detectors.

cells. After properly taking the kth smallest reference data from each reference window, the modified OS-CFAR detectors estimate the background noise power level Zi by manipulating the two selected reference data. That is, Zli =X(k), the kth smallest reference data among the leading reference window , . . . pXi+n}, and 22; =X(k), the kth smallest w+1Ji+2 reference data among the lagging reference window {Xi--n,Xi--n+l,‘. *,Xi_l}. And the background noise power level, Zi, is obtained by taking the greater value of Z li and Z2i, max(Zli, Z2i), and the smaller value of Zli and Z2i, min(Zli,Z2i), for the OSGO-CFAR and the OSSO-CFAR detectors, respectively. The CFAR detectors decide the presence of a target if Xi 2 TZi, where T is a threshold constant which is obtained according to a desired false alarm rate, the number of reference cells, and the CFAR scheme. One can say that the CFAR detectors decide the presence of a target if X/ =Xi/T >,Zi as the previous section. Let us denote pli and p2i as the number of reference cells less than or equal to X/ in the leading and that in the lagging reference windows, respectively. One can implement the modified OSCFAR detectors by just counting pli and p2i without sorting the reference windows {Xi+i, Xi+z, . . . ,Xi+n} The modified OS and {X;--n,Xi--n+l, . . .,X,-l}.

D.-S. Han /Signal Processing 62 (1997)

84

7346

INPLV

I Fig. 8. Implementation

concept for the modified OS-CFAR

CFAR detectors decide the presence of a target when min(~lli,~2i)>k and max(pli,p2i)>k for the OSGO-CFAR and the OSSO-CFAR detectors, respectively. Hence, the processing of the modified OS-CFAR detectors can be defined as follows: Given a constant threshold coefficient T and a decision value k, and the input sequence {. . . ,X,-n,. .., Xi-l,Xi,Xi+l,...,XI:+n,...},

compute the result sequence

{. . . , (~lii,~2i),(~li+i, ~2i+i ), . . .} defined by p li is the number of cells less than or equal to Xi/T among the leading reference window {Xi+ 1,Xi+z, . . , Xi+n }, and ~2; is the number of cells less than or equal to Xi/T among the lagging reference window {Xi-,,,JL+i,.

. . ,X-I},

and

decide either the presence of a target if min(pli, ~2~) >k and max(~li,~2i)3k for the OSGO-CFAR and the OSSO-CFAR target.

detectors, respectively,

Osco: m( /Ait 83 O!iSOI min(&, /J2i)

or no

4.2. Systolic design The systolic architectures of the modified OSCFAR detectors can be obtained by slightly modifying the proposed systolic architecture for the OS-CFAR detector. In the proposed systolic design for the OS-CFAR detector, X/ sequentially compares with the reference data residing in the lagging refer-

I

detectors.

ence window, and comparisons between X/ and the lagging reference data are complete when the test data X; reaches the upper leftmost PE. At this time, the obtained result is p2i, and is ready to be used for decision after obtaining ,uli. So ,u2; should be stored in a first-in first-out (FIFO) buffer of size n/2 for n = even and (n + 1)/2 for n = odd when it departs from the upper leftmost PE. X/ arrived at the lower leftmost PE meets the sequence of the leading reference data. At the same time, comparisons between X/ and the reference data are performed to obtain ,~4 Ii. When Xi departs from the lower rightmost PE, PE,+i, the comparison result ~1; also departs from PE,+i and reaches the combination PE. At the same time previously computed value p2i departs from the FIFO buffer and reaches the combination PE. On the next clock cycle, ,Uli and ~2i values reached the combination PE are combined according to the CFAR scheme. The output of the combination PE, pi, is min(,uli,p2i) for the OSGO-CFAR detector or max(pli,p2i) for the OSSO-CFAR detector. And then, pi is compared with a threshold value in the decision PE. The proposed systolic architecture for the modified OS-CFAR detectors with window size 2n + 1 is shown in Fig. 9. The additional PEs for the modified OS-CFAR detectors over the OS-CFAR detector are a FIFO and, max or min operator. The performance is the same as that of the OS-CFAR detector shown in Table 1.

D.-S. HanlSignal

Processiny 62 (1997)

85

7346

PE2

r=(n+1)/2 PEr

COMEINEPE

(4

P&a

Fig. 9. (a) Proposed systolic architecture for the modified OS-CFAR detectors when n is even. (b) Proposed modified OS-CFAR detectors when n is odd. (c) Functional descriptions of PEs.

5. Conclusions In this paper, we have presented the systolic architecture for the OS-CFAR and the modified OSCFAR detectors with the focus placed on real-time processing. The proposed systolic architectures can support sufficiently fast bandwidth so that data may be continuously fed for the high-speed processing. For

systolic architecture

of the

comparisons, previously designed systolic architecture for the OS-CFAR detector by Ritcey and Hwang has been introduced. The proposed architecture for the OS-CFAR detector has less PEs, less interconnections, and less gates than the Ritcey and Hwang’s architecture. Furthermore, the throughput rate of the proposed architecture is two times higher than that of the Ritcey and Hwang’s architecture. We have also

86

D.-S. Han ISignal Processing 62 (1997)

proposed the systolic architecture for the OSGOCFAR and the OSSO-CFAR detectors by slightly modifying the proposed systolic architecture for the OS CFAR detector. The throughput rate of the proposed systolic array architecture for the modified OSCFAR detectors is the same as that of the proposed OS-CFAR architecture. References [l] A.R. Elias-fuste, M.G.G. de Mercado, E. de 10s Reyes Dave, Analysis of some modified ordered statistics CFAR: OSGO and OSSO CFAR, IEEE Trans. Aerosp. Electron. Syst. AES-26 (1) (January 1990) 197-202. [2] P.P. Gandhi, S.A. Kassam, Analysis of CFAR processors in nonhomogeneous background, IEEE Trans. Aerosp. Electron. Syst. AES-24 (4) (July 1988) 427-445. [3] V.G. Hansen, J.H. Sawyers, Detectability loss due to greatest of selection in a cell-averaging CFAR, IEEE Trans. Aerosp. Electron. Syst. AES-16 (1) (January 1980) 115-118. [4] J.N. Hwang, J.A. Ritcey, Systolic architectures for radar CFAR detectors, IEEE Trans. Signal Processing SP-22 (2) (March 1986) 84-196.

7346

[5] J.N. Hwang, J.A. Ritcey, Systolic architectures for radar CFAR detectors, Proceedings Intemat. Conf. Acoust. Speech Signal Process. ‘90, 1990, pp. 1025-1028. [6] H.T. Kung, Why systolic architectures?, Computer 15 (January 1982) 37-46. [7] S.Y. Kung, VLSI Array Processors, Prentice-Hall, Englewood Cliffs, NJ, 1988. [8] LCA300K Gate Array 5 Volt Series Products Databook, LSI Logic Corporation, Milpitas, CA, 1993. [9] N. Levanon, Radar Principles, Wiley, New York, 1988. [lo] G. Minkler, J. Minkler, CFAR. Baltimore, Magellan, MD, 1990. [ 1 l] J.T. Ritcey, J.N. Hwang, Detection performance and systolic architectures for OS-CFAR detectors, IEEE Intemat. Radar Conf., 1990, pp. 112-116. [12] H. Rohling, Radar CFAR thresholding in clutter and multiple target situations, IEEE Trans. Aerosp. Electron. Syst. AES-19 (4) (July 1983) 608-621. [13] G.V. Trunk, Range resolution of targets using automatic detectors, IEEE Trans. Aerosp. Electron. Syst. AES-14 (5) (September 1978) 750-755. [14] H. Weiss, Analysis of some modified cell-averaging CFAR processors in multiple-target situations, IEEE Trans. Aerosp. Electron. Syst. AES-18 (1) (January 1982) 102-114.