INFORMATION SCIENCES 4, 233-249 (1972)
233
Digital Matched Filters for Detecting Gaussian Signals in Gaussian Noise t T. L. HENDERSON AND D. G. LAINIOTIS Department of Electrical Engineering and Electronics Research Center, The University of Texas at Austin 78712 Communicated by J. M. Richardson
ABSTRACT The use of a set of digital matched filters is presented as an alternative to direct computation of the likelihood-ratio, for the problem of detecting a random signal in random noise. It is assumed that a random process composed of Gaussian background noise and (with probability P) a zero-mean Gaussian signal is sampled at N instants, the samples being corrupted by additive Gaussian measurement noise. The samples are processed by K ,~ N digital correlation filters which are structured so that the signal can be detected with minimum Bayes risk. The optimum filters are shown to be matched to the most relevant components of the simultaneously orthogonal expansion of the set of sampled data. State variable techniques are used to find a very practical method for determining the optimum filter structures.
1. INTRODUCTION Suppose t h a t we are required to detect a r a n d o m signal i n the presence o f noise; m o r e specifically suppose
r ( t ) = I n(t) + s ( t ) ~n(t)
when the signal is present otherwise
(1)
where the signal s ( t ) a n d the b a c k g r o u n d noise n(t) are r a n d o m processes, a n d we m u s t decide whether the signal is present by observing a sequence o f noisy m e a s u r e m e n t s zi = r(ti) + m . i = 1. . . . , N , (2) t This work was supported by the Information Sciences Division of the Air Force Office of Scientific Research under Grant AFOSR-69-1764 and by the Joint Services Electronics Program under Grant AFOSR-67-0766E. Copyright © 1972 by American Elsevier Publishing Company, Inc.
234
T. L. H E N D E R S O N A N D D. G . L A I N I O T I S
taken at N sampling instants t I < t 2 < " ' " < tN. (The random variables mi represent measurement noise.) The following additional assumptions will be made:
A. s(t) and n(t) are Gaussian random processes and {mi}7=i is a sequence of Gaussian random variables. Furthermore, s(t), n(t), ml, m2 ..... mN are statistically independent. B. s(t), n(t), and {mi}~=l are zero mean. C. The covariances Rs(i,j) ~- E[s(ti)s(tj)]; R,(i,j) ~- E[n(ti)n(tj)];
and
M(i) ~- E[mi ~]
are known. (Note that E[mimj] = 0 when i :~j because of A and B.) D. M ( i ) > 0 for i = 1..... N. The assumption that n(t) and {mi}~=l are zero mean is made with no loss in generality, since any nonzero means would always be present and could be subtracted from the observations with no loss in our ability to detect the signal. We shall denote by Hi (H0) the hypothesis that the signal is (is not) present, and define a data vector z = (zl, z2 . . . . . zN) T.
(3)
Then under either hypothesis z is a zero-mean Gaussian random vector, and its hypothesis-conditional covariance matrices are
E[zzr/Ho] ~- Ro = R, + M,
(4)
E[zzr/Hl] ~=Ri = Rs + R, + M --- Ro + R~,
(5)
where the elements of the matrices R~, R, and M are defined (R~)lj ~
R~(i,j);
(R.)lj ~- R.(i,j);
and
(M)lj ~
g(i)31j.
(6)
The detector can be represented as a decision function ~ ( z ) which, given a data sample z, chooses either Ho or H~; i.e. ~ : R N --> {H0,H~}, where R N is the N-dimensional observation space. The performance criterion most often used to appraise a given detector ~ ( z ) is the risk, J ~ P'~m" Prob{~(z) = Ho
when Hi is true}
+ (1 - P ) . qf~-Prob{~(z) = H~
when Ho is true},
(7)
where P is the a priori probability that the signal is present (i.e. that H~ is true), and c~m and of: are the assigned costs (both > 0) of a "miss" and a "false alarm" respectively. (.When ~m = ~ : = 1, J is simply the probability of an error.)
235
D I G I T A L M A T C H E D FILTERS
The "optimum" detector, i.e. the one which minimizes J, is one based on the well-known Bayes test, .@*(z) = {HIH0
wheneVerotherwise l(z) > -q
(8)
where l(z) is the likelihood ratio and -q is defined as rl
(1 - P ) ~ ¢
(see [1], p. 26).
pC~ m
(9)
The resulting risk is called the "Bayes risk", and we shall denote it by J*[z]. It should be noted that J*[z] is a function of the conditional statistics of z and not of z itself. For our problem, the optimum detector can be expressed in a more explicit form (see [1], p. 107). ~*(z) =
HI
, detR~ whenever zr(Rffl -- Ri-I) z > 2 log ~/+ log ~ .
H0
otherwise
(10)
Unfortunately, this decision scheme becomes prohibitively complex when N, the number of sampling instants, is large. When the signal and noise processes admit state-variable models, Schweppe [2] has shown that the likelihood ratio can be obtained by recursive computation. However, for many applications, even this simplified scheme proves impractical. A technique which can be used to avoid difficulty is to use a decision function which depends only upon the values of K linear combinations of the measurements [3] N
Yk= ~ bktzc
for k = l , . . . , K ,
(11)
where Kis considerably smaller than N. Each y~ may be regarded as the output of a digital filter which correlates the measurement sequence {zt}~ with a predetermined weighting sequence {bkl}~ l, and if we define a vector bk ~ (bk l, bk2. . . . . bkN)r
for k = 1. . . . , K,
(12)
then yk = bkrz
f o r k = 1. . . . ,K.
(13)
Another interpretation is provided by defining a vector Y ~- (y~,Y2,...,yr) T
(14)
B ~ (bl, b2 . . . . . bK) r
(15)
y = Bz.
(16)
and a K × N matrix so that
236
T. L. HENDERSON AND D. G. LAINIOTIS
Clearly the data vector z has simply been "reduced" by a linear transformation. Thus y is, under either hypothesis, a zero-mean Gaussian random vector, and it may be regarded as a new data vector. We can associate with y its Bayes risk, J*[y] = J*[Bz], which is the least risk that can be attained when y is used for signal detection. Our problem will be to find the optimum B for a given value of K, i.e. to find a B which minimizes J*[Bz]. We will show that our optimum choice, designated B*, has the property that its row vectors are "matched" to the most relevant components in the simuitaneously orthogonal expansion of the data vector z. Furthermore, B* depends neither upon the a priori probability P nor upon the costs c~m, c~¢, and is invariant with respect to scalings of s (t) by a constant factor. We will present a very practical method for finding B* when N is large, by employing state-variable models for the signal and noise processes; this is perhaps our most important result. Before proceeding, we note that the independent variable " t " need not represent time; it could just as well correspond to position along a line in space. 2. SIMULTANEOUSLY ORTHOGONAL EXPANSION OF Z
It is a well-known fact that there exists a linear, invertible transformation which sends z into a random vector whose elements are statistically independent under either hypothesis. We shall develop this transformation in a form slightly different from that found in most texts, and interpret it in terms of a simultaneously orthogonal expansion of the data vector z. (Our procedure will be similar to that of Kadota and Shepp [4].) From assumption D (p. 234), the matrix M must be positive definite, so R0 is positive definite and R o 1/2 exists, is positive definite, and symmetric. Thus Rff~/2R 1Ro 1/2 is positive definite, symmetric, and has N positive eigenvalues D 1 > / D 2/> " " I> D N (17) and associated eigenvectors Rffl/2Rl Rol/2 ~ = Dlt~l
f o r / = 1. . . . . N,
(18)
which may be taken orthonormal, ~ r ~ j = ~,j
for i , j = 1 , . . . , N.
(19)
By defining d, ~- Rol/2¢~1
for i = 1 , . . . , N ,
Equations (18) and (19) can be expressed in the equivalent forms R~1Ridi = Dldt
for i = 1 , . . . , N
(20)
237
DIGITAL MATCHED FILTERS
d~rR0 dj = 3 u
for i , j = 1. . . . , N.
(21)
Alternatively, given D{s and d{s which satisfy Eqs. (20) and (21), we can define ¢~t ~ R1/Zd~ which will satisfy Eqs. (18) and (19). For a given set (D~,d~), (D2,d2) .... , (DN, dN) which satisfy Eqs. (17), (20), and (21), define cq ~ d~Tz f o r i = 1. . . . . N. (22) Then the ~ ' s are, under either hypothesis, zero-mean Gaussian random variables, and have hypothesis-conditional covariances, for i, j = 1..... N, E[~, ~j/Ho] = d, r
Rodj = 3 u
E[cq%/Hl]=dtrRldj=dtrRoRoIRldj=
DjdlrRodj=
(23) DjSij.
(24)
Define an N × 1 vector ~ ( ~ , , ~ 2 . . . . . ~N) T
(25)
T ~ ( d l , d 2 . . . . . dN)r
(26)
~t = T z .
(27)
and an N x N matrix so that
Equation (21) implies that the d{s are linearly independent, so T is invertible. Thus z = T - ' ,, =
N Z 1=1
where ~bl, ~2,..-, ~N are the column vectors o f t -I. Equation (28) is the simultaneously orthogonal expansion vector z. The basis vectors ~bz, ~2 ..... ~bN are, in general, not The words "simultaneously orthogonar' refer to the fact that coefificients ~ are statistically orthogonal, i.e. uncorrelated, hypotheses. Note that since T T -~ = / , di r ~bj = 3 u 3. OPTIMUM CHOICE OF
for i, j = 1,..., N.
(28)
of the data orthogonal. the random under both (29)
B
Given a zero-mean Gaussian random vector with two alternative positivedefinite covariance matrices R0 and R~, the problem of finding the best y = Bz is closely related to the simultaneously orthogonal expansion. For the case
238
T. L. HENDERSON AND D. G. LAINIOTIS
K = 1 (so that B is simply a row vector), it is shown in Kullback (see [5], p. 198) that the B for which the divergencel- is maximum is given by B = dr,,,
where j* is t h e j for whichfj -~ Dj + 1/Dj has its largest value. Kadota and Shepp [3] generalized this result to the case K > I, showing that the B which maximizes the divergence is B=
( d j l , , dj2 . . . . . .
djK.) T,
where Jl* is the j for which f j has its largest value, and j2* is the j for which f j has its second-largest value, etc. They also showed that the same choice maximizes the Bhattacharya distance. (Actually, they showed that it minimizes the Hellinger integral, which is equivalent.) Since the statistically independent ~l's have unit variances under hypothesis H0, and variances Dl under hypothesis Hi, it follows that the expansion components most relevant for hypothesis determination are those for which the D{s are farthest away from the value one. (This is in agreement with the use offj = Dj + 1/Dj in the cases just described.) However, for our problem Rl = Ro + Re. Using this fact and premultiplying Eq. (20) by fl~rRo, we achieve the following result: dl r Re d~ Dl = 1 + di---~Rodt
for i = 1,
....
N.
Thus D~/> I for i = 1,..., N. Hence the most relevant components of the expansion are those with the largest values of Dl. The following theorem is, therefore, not surprising.
Theorem: Given a set (D1,dl), (Dz,d2) .... , (DN, dN), which satisfy Eqs. (17), (20), and (21), define B* ~ (dl, d2,..., dK)r. (30) Then for any K x N matrix B, J*[B* z] ~
(31)
i.e. B* minimizes the Bayes risk. The proof of this theorem is given in Appendix A. If we define y* ~- B* z, (32) then the optimum decision rule is simply ~*(Y*)
H 1 whenever ~. ( 1 - 1/D~)(y*)12 > 21og~ + ~. log Dl. =
it,
H0
otherwise
t See Kullback [5], p. 6 for the definition of divergence.
ill
(33)
239
DIGITAL MATCHED FILTERS
This result is obtained by replacing z by y* and substituting the appropriate covariance matrices into Eq. (10). Certainly, B* does not depend P, C~m, or c~e. Furthermore, Eq. (20) is equivalent to Rol Rsdi = (Dl - 1)dl f o r / = 1. . . . ,N. (34) Thus if s(t) becomes fl.s(t), where fl is a constant scaling factor, then R, becomes fl2Rs, the eigenvectors remain the same, and the ordering of the eigenvalues is unchanged; i.e. B* is unaffected when the signal is scaled by a constant factor. Clearly, in order to find B* we must find the eigenvectors of Ro ~Rt corresponding to the K largest eigenvalues; so the fundamental problem is that of finding solutions (D, d) to the equation Ro I Rl d = Dd.
(35)
However, it should be noted that any solution (D, d) with D = 1 would yield a component of y* which would be useless for signal detection (since it would have the same variance under either hypothesis). From Eq. (34) it is clear that a useless solution (i.e. D = 1) exists if(and only if) Rs is singular. The "orthogonalization" of the solution vectors required by Eq. (21) is not difficult to achieve. If (Da, do) and (Db, db) are two solution to Eq. (35) and D~ :~ Db then dZR0db = 0; thus we need only insure that Eq. (21) is satisfied when D, = Dj. Equation (35) may be expressed in another form, .a~(D) d = 0,
(36)
,~a(D) ~ (1 - D) (Rn + M) + R~.
(37)
where Solutions therefore occur at values of D for which J~'(D) is singular; i.e. at the zeros of det,~(D). Unfortunately the dimension of the matrix ~ ( D ) is equal to the number of sampling instants, and if this number is large (say in excess of 50) it becomes practically impossible to find the zeros of d e t ~ ( D ) . In the next two sections of this paper we will assume that the signal and noise processes admit state-variable models, and will develop a technique by which the problem is reduced to finding solutions (D,g) of ~e(D) g = 0,
(38)
where Xe(D) is a square matrix which depends upon D and whose dimension is not more than twice the sum of the dimensions of the signal and noise state vectors. The values of D for which Eq. (38) has a solution are the same as those for Eq. (36), and the solution vectors d of Eq. (38) are related to the solution vectors g of Eq. (38) by a simple transformation.
240
T. L. HENDERSON AND D. G. LAINIOTIS
Before proceeding, we note that the rows dt r of B* are matched to the components of the simultaneously orthogonal expansion of z in the sense that dtrhbj = 31j (from Eq. (29)). Also, (Y*)t is just cq, the ith coefficient in the expansion. 4. STATE-VARIABLE MODELS FOR THE SIGNAL AND NOISE PROCESSES
We will assume that the noise and signal processes can be represented in terms of an L. × 1 noise state ~¢ector x.(t) and an Ls x 1 signal state vector x~(t) as follows,
n(t) = cnr(t) x.(t);
s(t) = c~r(t) x~(t),
(39)
where the state vectors obey the following dynamic equations, dx.(t) = F.(t) x.(t) + u.(t);
dt
dxs(t) = Fs(t) x~(t) + us(t).
dt
(40)
u.(t) and u~(t) are zero-mean, white Gaussian vector processes, independent of each other, with covariances
E[u~(t)u~r('c)]=U.(t)8(t-r);
E(us(t)u~r(~))=Us(t)~(t-z). (41)
Furthermore we will assume that the initial statistics for the state vectors are E[x~(tl)] = E[xs(tl)] -----0, and E[x.(tl) x.r(tl)] = K~(I, 1);
E[x~(tl) xsr(tl)] = Ks(I, 1).
(42)
We assume that the vectors c.(t), cs(t) and the matrices Fn(t), F~(t), U~(t), Us(t), Kn(1,1), and K~(1,1) are known. We now convert from continuous state variable models to discrete models. Since the models for n(t) and s(t) have the same form, we will refrain from writing equations for both models, and use only a single subscript A which may be taken as either n or s. Define the (square) transition matrix ~,a(t, ~') through the relations a,I,,a(t,
Ot ~') = F,a(t) ~,a(t, ~-);
~,a(r, r) = I,
(43)
where I is the identity matrix. Using the transition matrix, t|+!
x,a(h+l) = ~4(h+1, h)x,a(h) + -j" ~4(t~+w~')u40")d~t!
for i = 1. . . . . N -
1.
241
DIGITAL MATCHED FILTERS
If we define tl+l
A,a(i) ~ ~,a(t~+i, ti);
qA(i) ~ f
~A(t,+l, ~') U,I(Z) dr,
(44)
for i = 1..... N - 1,
(45)
tl
then the equation can be written x,a(tl+ l) = A,a(i) x,a(fi) + q,a(i)
which is the discrete version of (40). Note that the qA(i)'s are zero-mean, Gaussian random vectors. From (41) and (44) it can easily be shown that E(qA(i) qAr(j)) = Q,a(i) 3,j,
(46)
where QA(i) is defined tl+l
Q,~(i) ~- f
~,a(fi+t, z) U,a(~') ~,~r(fi+l, ~') d~-.
(47)
tt
Since u,(t) and us(t) are mutually independent, th(i) and q,(i) will be mutually independent random sequences. As a final comment before proceeding to the next section, we note that A,~(i) always has an inverse, since it is a transition matrix. 5. METHOD OF SOLUTION
Since we are only interested in solutions of Eq. (35) such that D ¢ 1, we may define D' ~- 1/(D - 1) and write Eq. (35) in the equivalent form d = M - I [ D ' Rsd - R.d]. By defining d(i)=
M-I(i)[ D' j=l ~ R~j(i,j)d(j)- j=l ~ R,(i,j)d(j)]
But for where
d(i) ~- (d)l, this equation can be written as for i = l ..... N. (48)
i,j = 1,..., N, RA(i, j) = E [/l(fi) A(tj)] = c,ar(t,) KA(i, j ) cA(tj),
(49)
K,a(i, j ) ~ E [x,a(fi) XzlT(tj)].
(50)
[Note that this definition of KA(i, j ) is compatible with the previous definition of K~(1,1).] By defining, for i = 1.... , N, N
v,a(i) ~ ~ 3ffiI
K4(i,j)e4(tj)d(j),
(51)
242
T . L . HENDEMON AND D. G. LAINIOTIS
Equation (48) becomes
d(i) = M - l ( i ) [D' c,r(h)v,(i) - %r(h) v,(i)]
for i = 1,..., N.
(52)
In Appendix B it is shown that vz(i) may be obtained as the solution of a system of linear difference equations,'~ v,a(i + 1) = Az(i)v,~(i) + Q,a(i)A~T(i)w,j(i)~f r i (53) w4(i + 1)= A~T(i)WA(i ) --C,~(t,+~)d(i + 1) ] o -- 1,..., N - 1 (54) t with boundary conditions w,~(N) = 0
(55)
v,~(1) = K,~(1, 1) [e~(q)d(1) + w,~(1)].
(56)
and It is also shown in Appendix B that, given d(i) for i = 1.... , N, Eqs. (53)-(56) have a unique solution. The procedure by which the summations of Eq. (48) have been expressed in terms of "state" vectors v,~ is similar to that employed by Baggeroer [5] to transform integral equations into state-differential equations. The models for A = s or n can be combined by defining, for i = 1.... , N,
[v.(i)J
c(i) ~ c(i) er(i); D(i) ~ [
Lc.(ti)J K(1, 2) = [
0
0
. . . . . . . .
i, K.(1, 0 1)J]. I. . . . . . . .
,
..... o_ .... ] !- - M - i ( i ) I . J '
where I,~ is an identity matrix of dimension L,~, and by defining, for i = 1,..., N-l, ,.,,.,,,:,,..., o A#(i) ~ Q(i) -~ -(~](i)_l.
...1
Equation (52) may then be written simply as
d(i) = cr(i)D(i)v(i)
for i = 1,..., N,
(57)
and (53) and (54) become v(i + 1) = A #(i) v(i) + Q(i) ATcr(i) w(i) ]~ w(i + 2) = A~,r (i) w(i) - e(i + 2) d(i-V 1)j ior i = 2,..., N 1-The superscript " - T " denotes inverse transpose.
1.
(58) (59)
243
D I G I T A L MATCHED FILTERS
But d(i + 1) can be expressed in terms ofv(i + I) by using (57), and v(i + 1) can be obtained from (58), so that Eq. (59) is equivalent to w(i+ 1) = - C ( i + 1)D(i + 1)A#(i)v(i) + (I - C ( i + 1)D(i+ 1) Q(i))AiTw(i) for i = 1. . . . . N - 1 .
(60)
Equations (58) and (59) can be combined,
v(/+ I)]
r_v_ml
w{i$-]}J = A(i) tw(i)J
for i = 1. . . . . N -
1,
(61)
where, for i = 1. . . . , N - 1,
A(i) ~-r . . . . . . . . . . . . . . . . . . . !. . . . . . . . . . . . . . . . . . . . . l l - C ( / + I) D(i + 1) A # (i) ', (I - C(i + 1) D(i + 1) Q(i)) Air(/)J"
(62)
The boundary conditions (55) and (56) can be written as w(N) = 0
(63)
(I - K(1, 1) C(1) D(1)) v(1) - K(1, 1) w(1) = 0,
(64)
and where we have substituted for d(1) from Eq. (57). From the linear recursion relation of (61) it follows that
v-.(N-)] = A [;{II ] , w(N)]
(65)
A ~ A ( N - 1) A ( N - 2)..- A(1).
(66)
where If we partition A into four square matrices, A = -~--i'x-- ,
(67)
then the boundary conditions (63) and (64) can be combined,
A~o i Aw~ ] [ 1v(1) = 0. i----if(i,-f)-~] }b(i~]-- ]~(1-.fjj [~v-(-l)J
(68)
If we define
r.,_,_,]. g ~ Lw(1)J'
r
a.,..,,
i
,
A,..,,,,, ]
,~(D) ~ LiCi(i.i~(f~ti(l?l--i(f.i~J'
(69, 70)
then (68) reduces to .g~(D) g = 0.
(71)
Each step that we have made toward arriving at Eq. (71) has involved only substitution, and is therefore reversible; so that Eq. (71) is, whenever D # 1,
244
T. L. HENDERSON AND D. G. LAINIOTIS
completely equivalent to Eqs. (48) and (36). If we can find a value of D ~ 1 for which det £P(D) = 0, then we can find a solution g of (71); and if we define d(i) from (57) using (61) and (69) to compute v(i), then d(i) will solve (48) (for the same value of D). Given a particular value of D ~ 1 for which Eq. (71) and (36) have solutions, Eqs. (53)-(57), (61), and (69) describe a linear, invertible, one-to-one transformation between solutions g of (71) and solutions fl of (36); hence if (71) has k linearly independent solutions g, then (36) has k linearly independent solutions ft. If K(1,1) is nonsingular, then (64) implies w(1) = [K-'(1, 1) - C(1) D(1)] v(1),
(72)
so that (71) is equivalent to ..W#(D) g# = 0,
(73)
where g# ~ v(1);
£~a#(D) = Awv + Aww[K-1(1, 1) - C(1)D(1)]. (74, 75)
The problem has been reduced to one of finding the largest zeros of det .LP(D) [or .La#(D)]. For given values of D, A can be computed and used to determine det £P(D); thus the zeros can be found by standard computer techniques. When a zero is found, initial conditions solving Eq. (68) can be obtained, v(i) and w(i) can be computed recursively by using Eq. (61), and the eigenvector d can be obtained through use of Eq. (52). If the signal, background noise, and measurement noise are stationary, then A(i) is the same for all i, and A = A N-I
where A ~ A(i). In particular, if N = 2" + 1, then A can be obtained by n matrix-squaring operations (e.g. N = 129 = 27 + 1). 6. CONCLUSIONS AND APPLICATIONS
We have shown that the matched filter concept can be successfully applied to the problem of detecting zero-mean Gaussian signals in Gaussian noise. For cases in which the signal and noise processes admit state-variable models, we have developed an etiicient, easily implemented technique for finding the optimum set of digital matched filters. Approximations to the continuous problem (i.e. finding the best set of continuous filters) may be obtained by making N large. Just as in the Kalman-Bucy filtering problem, we found it necessary to assume that the observed process contained additive white measurement noise. In order to discover what happens in a specific problem
245
DIGITAL MATCHED FILTERS
when there is no such white noise, we may determine the optimum filter set for successively smaller values of M ( i ) . In our treatment the use of a set of K matched filters was taken as an alternative to the direct (and difficult) computation of the likelihood ratio. Although our technique may yield a somewhat larger Bayes risk (particularly if K is very small) its inherent simplicity may make it more practical, or even necessary, for many real problems. Since the matched filter basically involves a correlation process, many existing devices (optical, mechanical, etc.) can be used for its implementation. Another advantage over the direct computation o f the likelihood ratio is that the transformation of the data achieved by the use of matched filters is quite simple (and linear), and even if the statistical assumptions for which the filters were designed are incorrect or only approximately correct, the reduced-data vector y will still probably prove useful for signal detection purposes. The assumption that the signal is zero mean is somewhat restrictive, but there is a large class of problems which fit this assumption, particularly problems of "passive listening" and noncoherent reception. The remark has been made previously that the independent variable " t " could represent position in space. If this is so the quantities tl, t2..... tN represent not sampling instants, but receiving-element positions, s(t) is the spatial random process impressed upon the receiving elements by the signal, n(t) is that due to background noise, and the m:s represent receiver noise. APPENDIX A
We must show that, for any K x N matrix B, J*[B* z] ~
(i) E [yyr/H0] = I, the K x K identity matrix (ii) E [ y y r / H ~ ] = T.,, a K × K diagonal matrix having ordered diagonal elements crL i> or2 t> "'" 1> ak. Then J*[B*z] ~
246
T. L. HENDERSON AND D. G. LAIN/OTIS
Also, note that crj is t h e j t h largest eigenvalue of ~-, and Dj is t h e j t h largest eigenvalue of the diagonal matrix
E[ouxr/Hi]. But Y.=E[yyr/Hl]= rE[omr/Hl]r r, so since r r r = i Poincare' Separation Theorem (see [7], p. 52) to obtain
DN-i+l <<.trj <~D~
we can use the
f o r j = 1. . . . . K.
However, all the D~'s are >/1, so 1 ~< % ~< Dj
for j = 1. . . . , K.
Now the optimum decision function ~ / * for y is simply ~ , ( y ) = Hi /-/o
whenever l=l ~ (1 - 1/c~l)~,l2 > t otherwise
where r ~ 2 log ~7-~ ~ log ~l; ~=1
~ (1 _p)r~f ~7=
pC~m
Thus J * [ r , , ] = J * [ v ] = P.~gm.Jl*[v] + (1 - P).cgf "J2*[Y],
where Jl*[Y] ~ Prob{-~v*(Y) = Ho
when Hi is true}
J2*[Y] ~ Prob{-@v*(Y) = Hi
when H0 is true}.
and Now suppose we use the same decision function on y* = B*z. The resulting risk, denoted J+[y*], will be J+[y*] = P'C~m"Jl+[y*] + (1 - P).c~y .j2+[y,], where Jl+[Y *] = Prob{-@v*(Y*) = H0
when HI is true}
J2+[Y*] = Prob{~v*(Y*) = HI
when Ho is true).
and But y* and y have the same statistics under hypothesis Ho(E[y*(y*)T/Ho] = E h, vT/Ho] = I), so A+[y*] = A * h ' ] .
Also, under hypothesis Hi, y* has the same statistics as (Di/cq) a=
zx
karl
0 .
0
1 y,
""(DKfirr)
247
D I G I T A L M A T C H E D FILTERS
SO
J2+[y*] ---Prob
t
= Prob
(1 - 1/at)at 2 < A
when Hi is true}
(I - 1/at) (Dda,) 2 (y) 2 < A
when Ht is true)
t
Prob
(1 - 1/at)(y)t 2 < )~
when H~ is true}
t=
since D,/at >1 1. But the last quantity is simply Jt*[Y] so Jl+[Y *] ~JI*[Y], S0
J+[y*] < J*[y]. However, J*[y*] is the lowest possible risk that can be attained using y*, so J*[y*] ~
Q.E.D.
In view of this lemma, the theorem will be proved if we can show that, given any K x NB, there exists a r satisfying the conditions of the lemma, such that J*[i'ct] ~
248
T. L. H E N D E R S O N A N D D. G . L A I N I O T I S
since the Bayes risk is not affected by invertible transformations of the "test statistic" yr. Define r = Tt Bt. Clearly, r satisfies the conditions of the lemma, since y = r z has the required properties. Also, J*[rz] = J*[Tty,] = J*[Bt z] <~J*[Bz]
Q.E.D.
APPENDIX B
Proof of Eqs. (53), (54), (55), and (56). Multiple application of Eqs. (45) and (46) yields the result that, for i = 1, .... N - 1 a n d j = 1..... N,
[A,a(i)Ka(i,j) f o r j ~ 1. Since N
va(i) ~ ~. Ka(t,j) ezl(tj)d(j)
for i = 1,..., N,
J=l
it follows that N
va(i+ 1)= ~ K a ( i + 1,j)ca(tj)d(j)
for i = 1. . . . . N - 1
J=l N
= ~. Aa(i) K,a(i, j) ezl(tj) d(j) J=l N
Qa(i) AAr(i + 1) Aar(i + 2 ) ' . . A a r ( j - 1)
J+l+l or
v,a(i + 1) = A,a(i)v,a(i) + Q,a(i)ASr(i)wa(i)
for i = 1. . . . . N - 1. (53)
where N
wA(i) -~
Aar(i) Aar(i + 1 ) ' " A a r ( j - 1) c,~(tj) d(j)
J=l+l
for i = 1..... N - 1 . If we define wA(N) ~ 0,
(55)
w,~(i + 1) = ASr(i) wa(i) - cA(t,+ l) d(i + 1).
(54)
then for i-- 1..... N - 1,
DIGITAL MATCHED FILTERS
249
T h e w,a(i)'s are u n i q u e l y d e t e r m i n e d b y (54) a n d the b o u n d a r y c o n d i t i o n wA(N) = 0, p r o v i d e d the d ( i ) ' s are given. R e p e a t e d use o f (45) a n d (46) gives K , a ( i , j ) = K,a(l, 1)A,ar(2) "" - A f t ( j -
1)
for 1 < j ~< N,
so t h a t vz(1) = K,a(1, 1) [c,a(t~)d(1) + w,a(1)],
(56)
thus v,a(1) is d e t e r m i n e d a n d (53) has a unique solution. REFERENCES 1 H. L. Van Trees, Detection, Estimation, and Modulation Theory, John Wiley, lqew York (1968). 2 F. C. Schweppe, "Evaluation of Likelihood Functions for Gaussian Signals," 1EEE Trans. on Information Theory, IT-I1, no. 1, (January 1965). 3 D. G. Lainiotis, "Optimal Feature Extraction in Pattern Recognition," Inter. Information Theory Symposium Abstracts, San Remo, Italy (Sept. 11-15, 1967). 4 T. T. Kadota and L. A. Shepp, "On the Best Set of Linear Observables for Discriminating Two Gaussian Signals," IEEE Trans. on Information Theory, IT-13, No. 2 (April 1967). 5 S. Kullback, Information Theory and Statistics, John Wiley, New York (1959). 6 A. B. Baggeroer, "A State Variable Approach to the Solution of Fredholm Integral Equations," Technical Report No. 459, MITRes. Lab. of Electronics, Cambridge, Mass. (November 1967). 7 C. R. Rao, Linear Statistical Inference andlts Applications, John Wiley, New York (1965). Received February 18, 1971