N. K. Bose and C. R. Rao, eds., Handbook of Statistics, Vol. 10 © 1993 Elsevier Science Publishers B.V. All rights reserved.
1Zl
_IL"-r
Decision-level Neural Net Sensor Fusion*
Robert Y. Levine and Timothy S. Khuon 1. Introduction
Multisensor data fusion refers to the processing of information from multiple sensors observing a common physical environment. The processing has as its goal an enhanced description of the environment through the synergistic combination of data from multiple sensors. This description often takes the form of a decision from a common set of hypotheses, or a parameter set determining the environment state. Awareness of the cost benefits and fault tolerance of distributed sensor architectures have resulted in intensive development of sensor fusion paradigms and applications in recent years. This is reflected in a series of SPIE symposia on sensor fusion [1-3], as well as an upcoming special issue of the I E E E Transactions on Systems, Man, and Cybernetics dedicated to distributed sensor networks [4]. Reviews of generic data and information fusion paradigms are found in [5-8]. As noted generally in [5], and applied specifically to radar signal processing in [9-11], multisensor data fusion is possible in a continuous range from data to decision levels. Data-level sensor fusion refers to the combination of data from multiple sensors into a pre-decision form. Presumably the output of the data-level fusion algorithm is a combined data set from which the hypothesis or state parameters are chosen with higher confidence than from any sensor alone. This chapter is concerned with the decision-level fusion alternative defined in Figure 1. In this case distributed sensor processors output a decision based only on the individual sensor observation of the environment. The outputs of the sensor processors, literally a decision from the common set of hypotheses, form the input to the fusion processor for an overall decision. The architecture benefits include robustness against intersensor communication failure; that is, a decision (or parameter estimate) is made at the sensor as well as fusion level. In addition, the architecture shares the robustness of all data fusion against individual sensor failure. Finally, the tremendous data reduction at the sensor level, raw data to sensor level decisions, allows a greater complexity in the * This work was sponsored by the Department of the Air Force under contract F19628-90-C0002. 577
578
R. Y. Levine and T. S. Khuon
SENSOR 1 I DATA_~
DECISION 1
SENSOR 1 PREPROCESSOR
I_Js~.,o~,"-.~
I L ,-o c,s,o o
SESO..
DECISION 1
~ ' i ~ PREPROCESSOR
DECISION Q
I ~ •
s~so~.,
s~so~.
14s~
sso.,
I[ DECISION 1
_,
~ , ~ o ~
.
~.:
°~c:s'°"~' --~...-o~c,s,o.o OEClS,O.o
Fig. 1. Generic neural net sensor fusion architecture for distributed sensor processing.
design of the fusion processor. Thus for example in the case of adaptive training of the fusion processor, a larger training set is possible at the fusion level because of sensor level data reduction. [11] contains a comparison of receiver operating characteristic (ROC) curves to obtain quantitative tradeoffs between data and decision-level sensor fusion. Due to the general (sensor independent) nature of the architecture in Figure 1, and the benefits mentioned above, decision-level sensor fusion has been intensively studied in the literature [12-20]. Optimal decision-level fusion algorithms have been derived based on statistical estimation and hypothesis testing techniques [12-17]. As with any Bayesian approach to hypothesis testing, optimum data fusion algorithms are a function of the probability distributions of the sensor inputs and performance probabilities of the sensor processors [14]. The design of such tests therefore involve an assumed model for each step in the data stream in Figure 1; including the physical event, sensor level data acquisition and processing, and fusion level decision processing. An alternative hypothesis testing paradigm, useful when detailed event and system modeling is impossible, is data-adaptive decision-making [21]. A 'training set' of data corresponding to known hypotheses is applied to a system which contains adjustable parameters. The parameters are adjusted according to a predetermined goodness-of-fit criteria until the system output is correct over the training set. The performance of the trained system is estimated from system performance on an independent 'performance set' of data with known hypotheses. The averaged system performance is simply determined by the
Dec&ion-level neural net sensor fusion
579
application of the above procedure to an ensemble of training and performance sets. A theoretical treatment of data-adaptive hypothesis testing, with performance estimates based on the statistics of the training set, is given in [22]. Training set derived performance measures are contrasted with performance assuming system equivalence to a maximum a posteriori probability (MAP) hypothesis test. This distinction is fundamental to all data-adaptive hypothesis testing, independent of whether or not sensor fusion is employed. Due to the relevance of data-adaptive test measures to neural net structures, an overview of the theory is given in Appendix A. It should be emphasized that dataadaptive hypothesis testing, while avoiding an assumed model for the data and system, requires a representative training set for successful definition of the test. This chapter is concerned with the application of a particular data-adaptive hypothesis test, namely the neural net, to the distributed sensor fusion architecture in Figure 1. Relative to the now conventional neural net taxonomy [21, 23], we will consider only mapping neural networks such as the multilayer perceptron [24] and back propagation nets [25-29]. These nets differ from association Hopfield-type nets [30, 31] by the application of supervised learning (adaption) toward the performance of a functional mapping without feedback [21]. In hypothesis testing, the desired map is from the input data space to an output hypothesis space. Alternative neural net architectures, such as those employing Kohonen learning [21, 32], attempt to store data distributions internally, rather than directly perform the data input-hypothesis space output mapping. A generic architecture for decision-level neural net sensor fusion consists of Figure 1 with neural networks at both the sensor and fusion level processors. The architecture contains an independent sensor neural net (SNN) for each detector simultaneously observing a stochastic phenomena. Each SNN is trained to the output decision space { H 1 , . . . , HQ} from a training set consisting of the corresponding sensor data. The output of the SNN consists of a normalized vector ( a l , . . . , ao) where the largest a i determines the hypothesis H r After all SNNs are trained, an independent data set is propagated through the SNNs to form an input training set for the fusion neural net (FNN). The FNN input consists of an analog Q x M-vector, corresponding to Q decisions for each of M sensors. The FNN output consists of the vector ( f 1,- • • , fo), such that the largest f~ implies an overall system decision for hypothesis H i. Note that the FNN performs cluster analysis in the QMdimensional input space, which for hypothesis H/is clustered to the vector i
(o . . . . ,o, 1, o , . . . ,o) for each of M sensors. The employment of neural net processors at the sensor and fusion levels in Figure 1 is motivated by recent literature. It has generally been found that neural net classifiers perform as well as conventional tech-
580
R. Y. Levine and T. S. Khuon
niques on a variety of problems, including linear, Gaussian, and k-nearest neighbor algorithms [23, 33-37]. in addition neural nets have been configured to perform MAP [38] and maximum likelihood [39] tests for arbitrary input distributions. The conditions under which multilayer perceptions implement Bayesian hypothesis testing are derived in [40-42] and further considered in [43]. This evidence suggests the applicability of neural nets at the sensor level nodes if prior data and sensor models are incomplete. At the fusion level it has been proven that optimum data fusion (for binary hypothesis testing) is implemented by a linear combination of SNN decision outputs followed by the application of a threshold [14]. The optimu m weight vector for the linear combination is a function of the performance probabilities of the SNNs. This fusion algorithm is equivalent to a first-order perceptron, a mapping neural net introduced and analyzed in the 1960s [44, 24]. The first-order perceptron, with weight vectors adapted by the perceptron learning algorithm [24], is a special case of the back propagation net mentioned above. In summary recent literature suggests that neural networks may be sufficient to implement optimum hypothesis tests at both the sensor and fusion levels in Figure 1. In order to demonstrate the implementation of decision-level neural net sensor fusion, two examples are considered in this chapter. In the first example the system in Figure 1 is applied to a problem for which a classical test is formulated at both the sensor (SNN) and fusion (FNN) levels. Section 2 contains a discussion of neural net detection of a transition in the standard deviation of Gaussian noise. The process standard deviations before and after the supposed transition are assumed to be different for each sensor. The inputs to the SNNs consist of windowed sample variances (X1, X2) from before and after the transition. It is easily shown that the test, denoted SXOR for stochastic exclusive-or, requires a classifier bilinear in the SNN inputs. The linear separation of a bilinear function of the input requires a second-order perceptron; that is, a net with nodes which multiply the input standard deviations [24]. Although implemented with a nontrivial (second-order) neural net, the variance transition problem is sufficiently tractable to allow an analytic solution for the classical test performance. False alarm and detection probabilities are expressed in terms of the threshold parameter used in the hypothesis test. An optimum threshold, corresponding roughly 'by definition to maximum detection and locally minimum false alarm probabilities, is computed for a number of different noise and sampling conditions. In Section 2 the performance of trained SNNs and FNNs are compared to classical test probabilities and the performance of the optimum fusion algorithm described above [14]. The results indicate that the SNNs matched the performance at the classical optimum: and the FNN matched the optimum fusion algorithm in exceeding the best SNN performance. This motivates the application of neural nets to the example of decision-level data fusion in Section 3. In addition, the SXOR test is a nontrivial, yet tractable, problem upon which any data-adaptive system can be analyzed. Finally, the detection of noise deviation transitions is a common situation encountered in nonstationary signal processing [45]. In Section 3 the fusion system architecture in Figure 1 is applied to the
Decision-level neural net sensor fusion
581
detection of object deployments during the recent Firefly rocket launches [46]. The two launches, denoted FFI and FFII, represented a rare opportunity for real data fusion due to the simultaneous observation by the three Millstone Hill (Westford, MA) radars. These included the Haystack X-band imaging, Firepond CO 2 laser, and Millstone L-band tracking radars. In the application of the sensor fusion architecture to the Firefly data, two back propagation SNNs were trained on the deployments using range-Doppler images derived from the Haystack X-band and Firepond CO 2 laser radar data. A third SNN had as input the passive IR spectral simulation of the deployments, consisting of the spectral irradiance of the objects in the range [5.0 p~m, 25.0 ixm]. In Section 3 the system is applied to deployment detection of an inflated balloon with training and performance data sets from the same launch (FFI). The performance of the entire sensor fusion system is compared to the SNN performance for each sensor in order to observe evidence of sensor synergism through data fusion. The application of the system to canister deployment detection, in which the training and performance sets were taken from different launches (FFI and FFII, respectively), is also discussed in this section. Sections 2 and 3 contain the discussion of theoretical and experimental implementations of decision-level neural net sensor fusion. Evidence is provided of neural net performance equal to the theoretical optimum for the SXOR test, and of enhanced fused detection during the Firefly launches. The analysis also provides a review of some fundamental issues in neurocomputing. These include network structure, back propagation run time complexity, convergence criteria, and training set design. Other examples of neural net applications to sensor fusion is given in [47-53]. In particular, in [47] sensor level processing of radar signatures to AR coefficients provides an amount of data reduction nearly equal to sensor level decisioning. A comparison of neural networks in decision and data-level fusion architectures is found in [54]. Appendix A contains a discussion of training set derived performance measures for adaptive systems [22], which is relevant to training set design and network structure. A Bayesian analysis of the statistics of the SXOR test, necessary for the analysis in Section 2, is given in Appendix B. The conclusions are contained in Section 4.
2. SXOR benchmark for neural net data fusion
In this section a quantitative comparison of neural net and classical hypothesis testing in the distributed sensor architecture is considered. Sections 2.1 and 2.2 contain an analysis of neural network mapping from data to hypothesis spaces for the SXOR test. The mapping is from a X2 distributed pair (X1, X2) to a decision space output (1, 0) for transition and (0, 1) for no transition. Figure 2 contains a schematic of the transition test mapping, which is implemented by the SNNs in Figure 1. An analysis of the fusion net (FNN) for the SXOR test is given in Section 2.3. The FNN inputs are decisions from the SXOR-trained SNNs, and the output is an overall transition decision.
R. Y. Levine and T. S. Khuon
582
i, 0" 1
i % T !
,..-/,~-~.:,,,"~..~.~,~.,'v,,~,,~ ~';,',~,.~',,= X1 '
N
X2 n
n
N
I
NEURAL NET OUTPUT
1 . . . . . . . . .., . . . . . . .
.. ....
....
i t N I W l l U ! ! l l l n
"IIIIIIIVnUUMIIUMI
Fig. 2. Schematic of variance transition test mapping: Stochastic exclusive-or (SXOR). Input N-sample window variances (X1, X2)-
2.1. False alarm and detection probability for S X O R In this section the statistics of the S X O R test is derived. T h e false alarm and detection probabilities are related to the threshold p a r a m e t e r for the test and properties o f the noise sampling. T h e sufficient statistic for a zero m e a n G a u s s i a n process {Yil i = 1 , . . . , N} is the sample variance [53] X= ~
(y, - 37)a ,
(1)
i=1
w h e r e )7 is the sample mean. T h e sample variance is X 2 distributed with a probability density 1'
P(X) =
N/~-I(X)
exp - ~ 2
,
(2)
w h e r e 0- is the standard deviation of the Gaussian r a n d o m process (Yi}, and F is the g a m m a function. T h e classic test of distinguishing b e t w e e n two deviations 0-o and o-1 results f r o m a threshold Y; X greater (less) than Y implies noise deviation 0-1 ( % ) .
Decision-level neural net sensor fusion
583
The computation of performance probabilities for the SXOR requires the conditional probabilities {p[(i, J) l(q, m)][i, ], q, m ~ {0, 1}} where each pair (i, j) corresponds to a (before, after) variance condition. An index i of 0 or 1 denotes a windowed sample variance from a low (%) or high (o-1) deviation process, respectively. The conditional probability p[(i, j)[ (q, m)] represents the detection of a noise condition (i, j) when the (before, after) windows truly correspond to the condition (q, m). The hypothesis test is performed on two data windows of length N from before and after the supposed variance transition. Assuming independent tests on each window, the conditional probabilities factor according to the equation p[(i, j ) ] ( q , m)] = p(ilq)p(j]m), where p(ilm) denotes the probability of choosing noise deviation o) for a single window with deviation o-m. The pair of decisions necessary to determine a transition is based on the value of X in equation (1) for two data windows and the threshold y (as described above). In Appendix B the false alarm and detection probabilities for variance transition detection are related to the conditional probabilities on a single window p( j [ m). The conditional probabilities for the transition hypothesis test are shown to be given by Pd = p(transition [ transition) = p(1 [ 1)p(0 [ 0) + p(0 11)p(1 [ 0)
(3)
and
Pf = p(transition I no transition) = p(1 I 1)p(0 I 1) + p(1 ] 0)p(0 10), (4) where it is assumed that the four possible noise conditions {(i, j) t i, j ~ {0, 1}} have equal prior probability. The conditional probabilities appearing in equations (3) and (4) are given by
= f ; pi(x) dx
(5)
p(O I i) = fo pi(x) dx
(6)
p(1 [i) and
where Pi is the function P(X) in equation (1) with (r equal to 0%. The behavior of Pa and Pf in equations (3) and (4) as the threshold is varied characterizes the hypothesis test [53], which for this problem is the determination of a high/low or low/high variance transition. The test is a stochastic version of the binary exclusive-or map, which has historically been important in neural net research [24, 25]. The central importance of this map derives from the concept of linear separability [24]. The embedding of the input sample variances (~,)(_2) to a higher-dimensional space (X1,)(2, ,)(1)(2) enhances the
584
R. Y. Levine and T. S. Khuon
linear separability of the X 2 distributed input data distributions. This fact suggests that the perceptron realization of the map is necessarily second-order in the net input (Xl, X2) [24]. It is emphasized that intermediate single-window variance decisions, high or low, are not performed in the test so that the map is different from the conventional Gaussian classifier. Figures 3 and 4 contain plots of false alarm and detection probability as a function of threshold y for o-0 of one and o-1 of two and four, respectively. The conditional probabilities were derived for various window lengths N by numerical computation of equations (3)-(6). Note that for deviation 0-1 and/or window size N of sufficient size, the peak of the detection probability occurred near the local minimum in the false alarm probability. The experimental results for neural net performance on the SXOR problem indicate convergence to this region of peak detection and locally minimum false alarm probability.
2.2. Back propagation neural net performance Figure 5 contains a back propagation neural net suitable for hypothesis testing on an input P-vector of data-derived parameters. The desired output for an 1.0 0.9
•
FALSEALARMJ
0.8 U,i
0.7
E
0.6
'~ in O
0.3
a.
0.2
...I
Z
O
V-O
w Ili,I a Z
".,,
0.5 0.4
'-,,
0.1 5
0 1.0 0.9 0.8
r,,-
0.7
'~ ,.,.I
0.6
'~ ul
0.5
,,.,.I U.
0.4
I
I
10
I
15
I
I
20
I
(c)
25
I
30 0
5
10
15
20
25
30
35
40
I
(d)
%
p,"
0.3 0.2 0.1 5
10
15
20
25
THRESHOLD
30
35
40 0
5
10 15 20 25 30 35 40 45 50 THRESHOLD
Fig. 3. False alarm and detection probability versus threshold 3' for S X O R test. N = 2 (a); 6 (b); 10 (c); 20 (d).
~ 0 = 1 . 0 , o- I = 2 . 0 ,
585
Decision-level neural net sensor fusion
0.9
(a)
:p.,,.
o
,0
0.7
~i*
"*"
FALSE ALARM
(b)
•
I¢
/
o
-6-. i
1
o.4 ~ ~
|
t
0.3 0.2
0.1 I 0 1.0
5 /
10
i
,
5
10
15 ,
20
25
30
~,na,,-~--~-r~---r--~
"
-
0.9
~ ~
~ ~.
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
15
20
25
30
35
40 0
5
10 15 20 25 30 35 40 45 50
THRESHOLD
THRESHOLD
Fig. 4. False alarm and detection probability versus threshold Y for S X O R test. o-o = 1.0, cr1 = 4.0,
N = 2 (a); 6 (b); 10 (c); 20 (d). input vector corresponding to hypothesis H i, i = 1 , . . . ,
Q is the vector
i (0 . . . . , 0 , 1 , 0 , . . .
,0)
as obtained from the Q output (deepest layer) neurons. In addition to the input and output neuron layers, the back propagation net contains so-called hidden layers. The adjustable parameters on the net consist of a threshold for every neuron in the net and connection weights between neurons on adjacent layers [25]. During forward propagation (left to right) a neuron with threshold 0 applies the sigmoid function 1 f°(I) = 1 + e x p ( - I + O)
(7)
to the input I consisting of the weighted sum of the neuron outputs from the leftward adjacent layer. Net adaption consists of varying the connection weights and thresholds until the output of the deepest layer neurons matches
586
R. Y. Levine and T. S. Khuon
CONNECTION WEIGHTS
NEURONTHRESHOLDS / CONNECTION
_~S
x~.j..~ INPUTpARAMET SETxp
~ "'"
:
~
H1NEURON
: ~[ :OUTPUT .~HYPOTHESIS HoNEURON
CONNECTION WEIGHTS
/ NEURONTHRESHOLDS
CO NNECTION WEIGHTS
Fig. 5. Back propagation neural net for hypothesis testing. P-vector input data. Q-vector hypothesis output. (0,..., 0, 1, 0 , . . . , 0) ~ Hi. the desired output for all elements of the training set. Details of the back propagation algorithm, which is derived from the gradient descent minimization of the difference between net output and target over the training set, is found in [25]. It has been shown that a three layer back propagation net is sufficient to implement any reasonable functional mapping between input and output vectors [55, 56]. Note from equation (7) that an undulation of the mapping is realizable by the equation ( f o ( I ) - f ~ (I)) for constant thresholds 0 and/3. This roughly suggests that two middle layer neurons are required for each oscillation in the map. However, as discussed in Appendix A, performance at a Bayesian optimum is not guaranteed by a network which performs an exact mapping for every element in a stochastic training set [22]. In order to test the performance of the back propagation algorithm on the SXOR map, a net with two input neurons, sixteen middle layer neurons, and two output neurons (Q = 2) was employed. The training rate and smoothing parameters (~7 and a in [25]) were chosen to be 0.5 and 0.2 by experimentation with various input sets. The input consisted of sample variances from a training set of Gaussian random noise segments with cr0 of one and 0-1 of either two or four. The sample variances were computed from windows of length N given by 2, 4, 6, 8, 10, and 15. For each noise pair (o-0 and o-1), and window N; two training ensembles each of sizes 400, 800, and 1200 were created with deviation pairs in the order (1, 1), (1, 0), (0, 0), and (0, 1). The two third layer neurons were trained to output values 1 and 0 for the (1, 1) and (0, 0) inputs; and the output targets were
Decision-level neural net sensor fusion
587
reversed for input corresponding to (1, 0) and (0, 1). The cost function C, consisting of the summed differences of third layer outputs and targets, was monitored during training to determine a point beyond which it did not decrease. Figure 6 contains a typical cost versus iteration curve for a 100 element training set with 0-1 of four and window length N of ten. Also included is the so-called Hamming error versus iteration curve, which is defined as the number of decision errors (within 1%) over the training set. As suggested in [21], nets were trained for a large number of iterations (>30 000) and the point of minimum cost was chosen as the optimum. An iteration is defined as a single adaption of all net parameters for every element in the training set. The implementation of the desired training set map, corresponding to C---~0, was often not attained with sixteen middle layer neurons. However, the Bayesian optimum was obtained through the net learning of data biases rather than each undulation in the training set map [22]. A network with too many hidden layer neurons often had plateaus in the cost function in Figure 6, which was probably due to the phenomena of 'neuron paralysis' [55]. This occurs at a neuron when the input corresponds to the tail of the threshold function in equation (7). In this case connection and threshold parameter adaption have little effect on the neuron output, hence the cost function remains constant [57]. It was found
160 re O IT" 120
I
I
I
I
I
I
!
I
I
--
--
U.I
Z :i :i
<
80
-
4°
-_
o-
J
L._.
-
I
0.5 ,"O I-O O
-
0.4 0.3
\
0.2 0.1 0.0
--
I
I
I
I
I
0
100
200
300
400
ITERATION
Fig. 6. C o s t v e r s u s i t e r a t i o n c u r v e for b a c k p r o p a g a t i o n l e a r n i n g of the S X O R m a p . T w o i n p u t n e u r o n s , 16 h i d d e n l a y e r n e u r o n s , two o u t p u t n e u r o n s . 100 e l e m e n t t r a i n i n g set. ~r0 = 1.0, o-I = 4.0, N = 10.
R. Y. Levine and T. S. Khuon
588
that, whereas extremely long training sometimes resulted in downward jumps in the cost function, the network performance on the test was not improved. Often the only effect was an increase in detection probability with a simultaneous increase in false alarm probability, and vice versa. A discussion of techniques to avoid 'neuron paralysis', and other computational obstacles in neural computing, is provided in [58, 59]. For each parameter set o-0, or1 and window length N, networks trained on ensembles of length 400, 800, and 1200 were performance tested. A performance set with 1200 variance pairs was input to the trained net; and for each pair the largest neuron output determined whether transition or no transition was chosen. The proportion of correctly and incorrectly chosen transitions then determined the detection Pd and false alarm Pf probabilities for the test. The performance probabilities for nets trained on three different sets (of length 400, 800, and 1200) were averaged. The combination of training sets of different size minimized the dependence of the network performance estimate on training set size. Figures 7 and 8 contain plots of Pd and Pf versus N as estimated from the performance sets for cr1 of two and four, respectively. The dotted curves on the figures correspond to the classical optimum defined in Section 2.1. As seen in Figures 7 and 8, the back propagation network closely approximated the performance at peak Pa and locally minimum Pf in Figures 3 and 4. This behavior is understood by the equal contribution of H 0 and H 1 errors over the training set in the cost function C [25]. The results motivate the use of neural nets for the sensor level decision mapping in Figure 1.
2.3. SXOR data fusion In Section 2.2 optimum performance was demonstrated for a back propagation neural net on the SXOR test, which requires a bilinear classification of the input sample variances. This network corresponds to a forward-based SNN in the distributed sensor fusion architecture in Figure 1. In this section the training and performance of an FNN taking input from two SXOR-trained SNNs is described. The results indicate the enhancement of variance transition performance from distributed sensor data fusion. In [14] an optimum data fusion rule for a binary decision was obtained within the distributed sensor processing architecture. As derived from the loglikelihood ratio test, assuming sensor processor i, i = 1 , . . . , M, and outputs u; of - 1 or + 1 for decision H 0 or H1, the data fusion rule is given by
f(ul ' ' ' "
,
uM) =
+1
i f a o + ~1a i u i>O,
-1
otherwise,
(8)
where the coefficients as, i = 1 , . . . , M, are given by a i
P m,P~i
(9)
Decision-level neural net sensor fusion I 1.0
n
0.8
-
0
0.6
-
0
0.4
-
0.2
-
0.0
-
I,_1
I
I
I
589 I
I
I roll
IZl
I:g ft. Z IO uJ I-LU 13
>.
BACK PROPAGATION m
1.0
~
m
OPTIMUM THRESHOLD
I
I
I
I
I
I
I
I
I
I
I
I
I
I
m
.J
m
0.8
m
0 tr
rr'
~: ._1 I,U
ol /
0.6
0.4
0.2
ii 0.0 2
4
6
I
I
I
I
I
I
I
8
10
12
14
16
18
20
WINDOW SAMPLE NUMBER
N
Fig. 7. SXOR false alarm and detection probability versus window size N for back propagation NN and classical test. o-0 = 1.0, ~r1 = 2.0. and
P1 + 1 ~ , a o _ - l o g ~oo
[Pmi(1-Pmi)
(10)
with P0 and P1 the prior probabilities of H 0 and H1; and Pmi and Pfi the miss and false alarm probabilities of the ith sensor processor [14]. T h e architecture
R. Y. Levine and T. S. Khuon
590
I
1.0
I
I
I
I
I
I
I,,,
I,,
__1
"i 0.8 0.6 te
I1,. Z
0
0.4
-
0.2
-
0.0
-
0
ul I-
BACK PROPAGATION
I,,I,,I a
>. I..-
m
1.0
m
m
~
OPTIMUM THRESHOLD
1
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
2
4
6
8
10
12
14
16
18
20
,-,I
I11 0.8 I11
0
I~
0.6
rr
0.4
m
-,I
m 0.2 t~ ...I LI.
0.0
m
WINDOW SAMPLE NUMBER
N
F i g . 8. S X O R f a l s e a l a r m a n d d e t e c t i o n p r o b a b i l i t y v e r s u s w i n d o w size N f o r b a c k p r o p a g a t i o n N N a n d classical test. % = 1.0, 0"1 = 4.0.
implied by equations (8)-(10) is in fact a first-order perceptron [18, 24], which can be realized through the adaption of the connection weights a i, i = 0 , . . . , M, by training. To implement perceptron learning for input u i = -+1, i = 1 . . . . , M, define the normalized predicate vector ~ = (1, U l , . . . , UM)/ 1 and connection weight (M + 1)-vector A = ( a 0 , . . . , aM). After a training set element input vector is propagated through each SNN, and the
591
Decision-level neural net sensor fusion
SNN decisions are determined by the largest neuron outputs, the dot product ~ . A is computed. In the case of a correct FNN decision, ~b- A > 0 (<0) for ~b corresponding to H 1 (H0), the connection weight vector is not changed ( A ' = A). For an incorrect FNN decision the connection weight vector is altered by the normalized predicate vector, A' = A -+ (b, where + ( - ) corresponds to cb- A < 0 (>0) for (b corresponding to H 1 (H0). An iteration of the perceptron adaption algorithm consists of the application of the above algorithm for every element in the training set [24]. Training continues until the FNN performs a correct decision for the entire training set, or until the FNN performance does not improve with training. The architecture for the fusion of two SXOR-trained back propagation SNNs is shown in Figure 9. It is assumed that the higher noise deviation 0-1 is 'sensor-dependent', so that each SNN was previously trained on a different variance pair 0.0 (=1.0) and 0-1. For each window size N (=2, 4, 6, 8, !0), a pair of SNNs was trained on sample variances with 0-1 of two and four. As in the previous experiment (Section 2.2), the SNN target outputs were (1, 0) and (0, 1) for transition and no transition, respectively. A performance set of 1000 variance pairs each was used to compute the SNN detection Pa, false alarm P~, miss Pro, and correct no transition Pcno probabilities for the test. The SNN decisions were determined by the largest neuron output. A plot of these performance probabilities for the (o-1 = 4.0) SNN and (o-1 = 2.0) SNN as a function of window size N is shown in Figure 10. A determination of the SNN
x2--~~
TRANST ,O ,N
PROPAGATION
~
SNN
NO TRANSITION
/
I-+1
I ~
F
~
/
OEC,S,ON-
I
"
LARGEST"
I FIRSTORDER
. U.ONOU U
x'l
I ~i
,
~
-I
TRANSITION
I PRO~7,XXT'ON~/'----~ I
x2
~1 /
BACK ~ SNN
/
o,.0j
..~
\
I
RNS,T,ON
-
-NOTRANS,T,ON
f I,,.JP~
I - ~ , / I
I
-
DECISION: LARGEST NEURON OUTPUT
Fig. 9. F u s i o n architecture for S X O R test. SNN1 (~r0 = 1.0, ~r1 = 2.0), S N N 2 (~o = 1.0, ~rl = 4.0), F N N first-order p e r c e p t r o n .
592
R. Y. Levine and T. S. Khuon (T1 = 4,0
SNN
~1 = 2.0 S N N O P T I M U M FNN
•
i
O
.
.
.
.
.
.
BPFNN I
_~ °
°
°
°
.
.
.
.
.
.
.
.i... .
.
°
. . . .
-~-=
"I
o.°o°°O°°'°°l
IX
I
.............
"'~m~..~,.~..m.dw~
.1I
Io. . . . . . . . . . . . .
t .............
I .....
°°°°
I
....
t
IX
0
~
-
-
]? . . . . i...... I
1
-
o
...........
:3: t.l
-
, "
.............
:--~:-:'L7
I ..........
----i
IX
0 1-
I
I
I
I
I
I
I
I
I
2
3
4
5
6
7
8
9
10
N Fig. 10. Performance probabilities for the fusion of two SXOR trained SNNs. Pd, Pf, Pm, and PcH° versus window size N for (~1 = 2.0) and (~r1 = 4.0) SNNs, optimum FNN, and back propagation FNN.
detection and false alarm probabilities allowed the definition of an optimum perceptron FNN from equations (8)-(10). An estimate of the perceptron FNN performance was obtained with 1000 variance quartets ((i, ]), (i', j')), i, j, i', j ' E {0, 1}. The quartet ((i, j), (i', j')) corresponds to input variance pairs (i, j) and (i', j') for the SNNs with 0-1 of two and four, respectively. Recall that i of one (zero) corresponds to the choice of a high (low) noise deviation in the definition of the sampled variance. The SNN output decision was converted to u i = +-1, i = 1, 2, as in Figure 9, before input to the perceptron FNN. Figure 10 contains the perceptron FNN performance as a function of N as determined from the performance set. Note that the FNN matched the performance of the (0-1 = 4.0) SNN for N is 2, 4, 6, 8, and 10. The (0.1 = 2.0) SNN has a small effect on the optimum FNN due to the generally poor performance of the net (I'm -- e~ = o . s ) .
593
Decision-level neural net sensor fusion
Motivated by the representation of the optimum FNN as a perceptron, a back propagation FNN (BPFNN) was considered for the data fusion of the two SXOR-trained SNNs. The BPFNN consisted of four inputs (two from each SNN), a sixteen neuron hidden layer, and two output neurons. The BPFNN was trained on 25 randomly generated variance quartets in the order ((0, 0), (0', 0')), ((1, 0), (1', 0')), ((1, 1), (1', 1')), and ((0, 1), (0', 1')). Each variance quartet was propagated through the SNNs and normalized to define the four element input to the BPFNN. As in the case of the SNNs, the BPFNN targets were (1, 0) and (0, 1) for transition and no transition, respectively. In order to speed-up training for an FNN with only sixteen hidden neurons, the variance quartets from the overlapped region of the input domain were removed from the training set. As discussed in [22] this procedure usually suffices to obtain Bayesian optimum performance through the learning of data biases. After BPFNN training a performance set of 1000 random variance quartets was generated and propagated through the entire sensor fusion system. A count of correctly and incorrectly detected transitions and no transitions over the performance set determined the conditional probabilities plotted in Figure 10. Note that the trained BPFNN essentially matched the optimum FNN at the (~1 = 4.0 SNN BPFNN ..._ _ -I-
Pd
I. . . . . .
~ 0.5
Pm
. . . . . . .
I
I
I
I
I
I
I
I
I
I
I
I
I
I
2
3
4
I 5
I 6 N
I
I
I
O.5 I o
I
PcH o 0.5
.
.
.
.
I. 7
.
.
I 8
.
.
.
.
I. 9
.
.
'1
I 10
Fig. 11. Performance probabilities for the fusion of two SXOR trained SNNs. Pd, Pf, Pro, and PcH° versus window size N for (or1 = 4.0) SNN and back propagation FNN (BPFNN).
594
R. Y. Levine and T. S. Khuon
performance of the (0-1 = 4.0) SNN for window lengths 2-10. These results suggest that the trained distributed sensor fusion system attained at least the performance of the strongest sensor at any time. In order to demonstrate performance enhancement through data fusion, the fusion of two (0-1 = 4.0) SNNs trained on data windows of lengths 2-10 was considered. For each window a three layer BPFNN was trained on the SNN pair outputs from 100 input variance quartets. As in the training above, variance quartets from the overlapped regions of the input domain were discarded from the training set. This procedure required a BPFNN training time of about 30 minutes on the Silicon Graphics Workstation (with a sixteen neuron hidden layer). The BPFNN performance probabilities were computed from 100 independent performance sets each consisting of 100 variance quartets. Average BPFNN performance probabilities (Pd, Pf, Pm, Pen0) are shown in Figure 11 for comparison with the (0"1 =4.0) SNN performance probability set. A BPFNN performance enhancement over the individual SNNs is demonstrated most dearly in the Pm and PcH0probabilities for overlapped input distributions (small N). The results in Section 3, in which neural net fusion is applied to the Firefly launches, also demonstrates enhanced performance through the fusion of distinct sensors.
3. Firefly sensor fusion experiment
In this section the distributed sensor fusion architecture described above is applied to the three-sensor fusion of measurements taken during the recent Firefly launch. The experiment, involving the complicated logistics of three imaging and tracking radars, provided a rare opportunity of demonstrate the power of decision-level neural net sensor fusion [46].
3.1. Firefly experiment The Firefly (FF) experiment consisted of two rocket launches (FFI on March 28, and FFII on October 20, 1990) from Wallops Island, VA into the Atlantic Ocean about 400 km eastward. During the flight the deployment of an inflatable balloon was observed simultaneously by the three Millstone Hill radars sited in Wesfford, MA at a range of approximately 750 km from the targets. The active sensors were the Haystack X-band (A = 3.0 cm) and Firepond CO 2 laser (A = 11.2 ~m) imaging radars, and the Millstone L-band (A =23.1 cm) tracking radar. Figure 12 shows the two-stage deployment sequence which was analyzed by the sensor fusion architecture in Figure 1. About six minutes after the launch, a metallic canister (cross section --1.0 m 2) was deployed from a much larger metallic payload. As the payload fell away from the track, the canister ejected four metallic doors and an inflating carbon cloth cone (cross section = 2 . 0 m2). As shown in Figure 12, for both payload-canister and canister-balloon deploy-
595
Decision-level neural net sensor fusion
BALLOON DEPLOYMENT .~80 s = CANISTER DEPLOYMENT
PAYLOAD / SEPARATION ~:~ ,,s
_~m ~'--.~/~
~ ~ ' ~
_~'~
_..= . . . . .
--~
-
~
f
~"~
"~
\ • FIREPOND • HAYSTACK
WALLOPS ISLAND, VA.
• MILLSTONE
Fig. 12. Firefly experiment launch sequence: Phases I, II, IrI for payload-canister and canisterballoon deployments. Phase I predeployment; phase II deployment; phase III postdeployment.
ments, the predeployment, deployment, and postdeployment phases are clearly identified. The input data for the sensor fusion system consisted of range-Doppler images from the Haystack and Firepond radars; and a passive IR spectral simulation of the objects in the images. The range-Doopler images contained information of object segmentation; whereas the passive IR spectral simulation was sensitive to the exposed object material composition. Radar imaging takes advantage of a moving target's aspect angle change to obtain a signal Doppler shift proportional to the scattered cross range extent. The Doppler resolution is proportional to the inverse of the signal integration time; over which it is assumed that the scatterer has moved a negligible distance and the signal is coherent. Through analysis of the object motion the Doppler shift is scaled to a physical cross range distance [60, 61]. This is coupled with an estimate of the range from the signal delay to obtain a two-dimensional range-cross range image of the object. The range-Doppler technique results in image resolution greater than the limits imposed from the radar aperture and radiation wavelength. Details of range-Doppler imaging theory for the Haystack and Firepond radars is given in [62, 63], respectively. The third sensor input to the sensor fusion system was from a passive IR simulation of the objects in the images. The Lincoln Laboratory-developed simulator was used to provide a feasibility study for passive IR deployment detection [64]. The inputs to the simulator included object shape, dimensions, spin/precession rates, and orientation relative to the sun. The input thermal properties were initial temperature, emissivity, interior emissivity, absorptance, thermal mass (density x heat
R. Y. Levine and T. S. Khuon
596
capacity), and specularity. Finally, a climate and cloud cover dependent model of the Earth spectral irradiation through the atmosphere was input. The output from the simulator consisted of the object spectral irradiation into the solid angle over the range [5.0 p~m, 25.01xm] in Watts/steradians (W/sr). The spectral irradiation provides information about the material composition of an object. In the wavelength range [5.0 p~m, 25.0 p~m] this information is indirect through estimates of relative emissivity and reflectance at the surface. Thus for example a metallic object with low emissivity and absorptance (e --- a --- 0) has a spectrum dominated by the reflected earthshine. As seen in Figure 13, the metallic spectrum has notches at the ozone (.=9.5 ~m) and CO 2 (=13 p~m) wavelengths due to atmospheric absorption of the earthshine. This is to be contrasted with a graybody object (e = 0.75) in Figure 14, in which a classic blackbody spectrum dominates the spectral irradiance. Note from the spectra in Figures 13 and 14 that the graybody irradiance is about twenty times the reflected component for a 1.0 m 2 object. This suggests that the spectrum of a graybody object among a set of metallic targets will dominate the total spectral irradiance. The canister-balloon deployment sequence for the FFI launch, with the identification of the predeployment, deployment, and postdeployment phases, is shown in Figure 15. Figure 16 contains the passive IR simulation from each phase; a reflective earthshine canister spectrum for predeployment, the superposition of metallic door and carbon cloth (graybody) spectrum for deployment, and a graybody carbon cloth spectrum for postdeployment. These
I
!
I
I
I
I
I
I
I
0.12
O.lO
w 0
0.08
P ~ WE
0.06
z"; ee I - 0.04 O u.I
t,n 0.02
/ 1 /
I
I
I
I
I
|
I
I
I
I
6
8
10
12
14
16
18
20
22
24
WAVELENGTH (pm) Fig. 13. P a s s i v e I R s i m u l a t e d s p e c t r u m o v e r r a n g e [5.0 i~m, 25.0 I~m] in W / s r . M e t a l l i c o b j e c t w i t h 1.0 m 2 cross section, e = a = 0.
Decision-level neural net sensor fusion
I
"~ 3.0
I
I
I
I
I
I
597
I
I
I
--
~'~ ~s ~ ~.o ~_ 1.~ 1.0
I
I
I
I
I
I
I
I
I
I
6
8
10
12
14
16
18
20
22
24
WAVELENGTH
(pro)
Fig. 14. Passive I R s i m u l a t e d s p e c t r u m o v e r range [5.0 p,m, 25.0 I~m] in W/sr. G r a y b o d y object w i t h 1.0 m z cross section, e = 0.75, a = 0.9.
FULLYDEPLOYED 1-HzSPINRATE 2-mLENGTH
POSTDEPLOYMENT
.~>~,,~1 ~ ~'~'~i ~,~.1~ ~ ~ ~l'j~'
CANISTERANDDOORS REFLECTIVEALUMINUM -~.O-~AREA
~
/x~~ PREDEPLOYMENT l
~~DOOR
~
'DECOY EMISSIVECARBON CLOTH
\~'Z'..~
I~VySTAR TINFLATION EJECTION
DEPLOYMENT
~4.3.HSzPslNIU PATE Fig. 15. Canister-balloon deployment sequence for FFI launch. Predeployment, deployment, and postdeployment phases.
598
R. Y. Levine and T. S. Khuon
~
~ 1.0
~. ~
o.4
~,
0.~
i~-~ 0.0 --
~ I
,
I
i
,
5
10
15
20
25
i.-0
O Z 10
I
I
I
I
I
I
I
I
I
~ ao
~ ,,o,,~ ~o ~
~-o
I,,L
~
,
,
,
,
,
,
,
,
,
,
6
8
10
12
14
16
18
20
22
24
WAVELENGTH (pm) Fig. 16. Passive IR simulated spectra [5.0p+m, 25.0tzm] in W/sr for three canister-balloon deployment phases in FFI launch.
simulated spectra form the training set for the passive IR SNN in the sensor fusion architecture discussed in the next subsection. Figure 17 depicts the formulation of the fused sensor decision on balloon deployment from Haystack and Firepond range-Doppler images and a passive IR simulation. Due to the longer Haystack coherent integration time, the range and cross range resolutions of the Firepond and Haystack radar were comparable. The most important differences between the radar images resulted from a Haystack beamwidth about 100 times the Firepond beamwidth. The Firepond beamwidth of 7.5 m at 750 km was sufficient to observe only single targets in a complex scene, whereas the Haystack radar observed a much larger cross range extent. It should be emphasized that these Firepond properties are beneficial; that is, a shorter integration time allows more rapid image generation (-3000 times faster) and a narrow beam is more difficult to detect. As seen in Figure 17, during the predeployment phase (of about 24 s) Firepond images consisted of only the metallic canister, whereas Haystack
599
Decision-level neural net sensor fusion
PREDEPLOYMENT
PASSIVEIR
DEPLOYMENT
/,,._
L._ 5p
POSTDEPLOYMENT
25p REFLECTED EARTHSHINE CANISTER
SUPERPOSITION CANISTER-BALLOON
EMISSIVE GRAYBODY BALLOON
CANISTER
BALLOON-DOORS
BALLOON
FIREPOND
HAYSTACK
O
CANISTER-PAYLOAD
.-3. BALLOON-DOORS
L BALLOON-DOORS
Fig. 17. Formulation of multisensor data fusion for canister-balloon deployment: Haystack and Firepond range-Doppler images, and passive IR simulation of predeployment, deployment, and postdeployment phases.
images contained returns for the separating payload. The passive IR spectrum was weak (--0.6 W/st peak) and earthshine-dominated with notches at 9.5 and 13 Ixm. During the two-second deployment phase the cross range velocity component of the ejected doors resulted in a rapid loss of door images for Firepond. Two of the doors moved roughly in parallel to the inflating balloon, so that_throughout the deployment the Haystack images consisted of the decoy and nearby doors represented in Figure 17. As noted from Figure 16, the balloon graybody radiation dominated the structure in the earthshine spectra of the doors. The postdeployment phase of 30s was determined from the Firepond images of an inflated carbon cloth cone, Haystack images of the balloon and two sufficiently separated metallic doors, and a passive IR carbon cloth graybody spectrum. The data represented in Figure 17 was input to the sensor fusion system described in Section 2 for a decision of predeployment, deployment, and postdeployment phases. The irreducible ambiguities inherent in the single sensor data are observed in Figure 17. The passive IR sensor discrimination between deployment and postdeployment was weak due to graybody dominance in the reflected earthshine spectra. The Firepond sensor was ambiguous between pre- and postdeployment phases due to the similarity
600
R, Y. Levine and T. S. Khuon
of the canister and balloon range-Doppler images. The shape difference between the cylindrical canister and the cone-shaped balloon is a weak feature in noise-corrupted data. Further image processing such as intensity averaging and smoothing may enhance the radar image-based decisions [65]. However, in the sensor fusion experiment preprocessing was limited to single intensity threshold and centroid operations. The Haystack image set was overall the least ambiguous due to the generation of complex scenes. However, during deployment the radar often lost reflections from the doors and became ambiguous between predeployment and deployment decisions.
3.2. Firefly sensor fusion system Figure 18 contains the distributed sensor fusion system used to analyze the Firefly balloon deployment from Haystack and Firepond range-Doppler images, and the passive IR simulation. Three back propagation SNNs were trained to output a deployment decision based only on the individual sensor data. The three output neurons, corresponding to predeployment, deployment, and postdeployment on each SNN, had as output an analog value in the range PASSIVE IR
--I~i"',~
I
PREDEPLOYMENT
NNI /
.J
ka' ~
|,,~/
POSTDEPLOYMENT
FIREPOND PREDEPLOYMENT
i
•- - . - - - ~ r ' , ~
7,,
PREDEPLOYMENT DEPLOYMENT
DEPLOYMENT
- - - ' - ' ~ I NNII / ____.~[¢.~"
POSTDEPLOYMENT
~
HAYSTACK m
PREDEPLOYMENT ;ENSOR
DEPLOYMENT
o
Fig. 18. Distributed sensor fusion system for FFI canister-balloon deployment detection. Back propagation SNNs for passive IR, Haystack, and Firepond sensors. Back propagation FNN.
601
Decision-level neural net sensor fusion
[0, 1]. The back propagation FNN took the normalized SNN outputs as input, and mapped to an overall decision based on the three neuron SNN outputs for each of the three deployment phases. The SNN and FNN output targets were (1, 0, 0) for predeployment, (0, 1, 0) for deployment, and (0, 0, 1) for postdeployment. The architecture in Figure 18 implies that the FNN was trained to perform a duster analysis in the nine-dimensional space of SNN outputs. The FNN inputs were clustered around ((1, O, 0), (1, O, 0), (1, O, 0)), ((0, 1, 0), (0, 1, 0), (0, 1, 0)), and ((0, O, 1), (0, O, 1), (0, O, 1)) for predeployment, deFUSION NET BACK PROPAGATION TRAINING ALGORITHM
TRAINED SENSOR NETS PASSIVE IR SPECTRA
¢;,
,
FIREPOND IMAGES
1
~INORM
.o..,
HAYSTACK IMAGES
'[~
0.30
I
INORM . f ' - - ~ /
I
I
I
I
I
I
I
I
I
I _
0.25 0 Z 0 ~, 0.20 Z IJ. I.- 0.15 ¢n 0 O 0.10
0.05 I
I
I
10
20
30
40 50 ITERATIONS
I
I
I
I
60
70
80
90
Fig. 19. Fusion neural net cost function C versus training iterations for sensor fusion system training on FFI canister-balloon deployment detection.
602
R . Y . Levine and T. S. Khuon
ployment and postdeployment, respectively. The SNNs for Haystack and Firepond had a 20 × 200 pixel input plane, a 4 × 4 neuron middle layer, and a third output layer with three neurons. The passive IR SNN and the FNN had sixteen neurons in the middle layer, three neuron outputs, and input layers of twenty and nine neurons, respectively. The radar SNN structure was determined in part by the computational complexity of the fully interconnected two-dimensional back propagation net; and by the minimum number of neurons required for convergence over the training set of images. The onedimensional nets (one SNN and the FNN) were not complexity bound so that the number of hidden neurons was determined by convergence issues discussed in Section 2.2. The Haystack and Firepond SNNs were trained on 3-4 images each from predeployment, deployment and postdeployment. For each image pair the aggregate passive IR spectrum was computed based on the objects in the Haystack images. The training of each radar SNN on a training set of about 12 images using the back propagation learning algorithm required about 30 min. on a Silicon Graphics Iris Workstation. Upon completion of SNN training, a set of about twenty images and passive IR spectra each from the three deployment phases were propagated through the SNNs. The normalized SNN outputs formed a training set for the FNN. It should be emphasized that the training set for the FNN must reflect the uncertainty in decisions from each IZ i,i
az uJ nrl
o
0
0
TIM~:
TIME
' l.°L
'
o
I,I,I O
TIME
o
TIME
TIME
TIME
IZ iii
°..°°c , . 1° IIn 0 ,-,
0
1°I 0
' 1°
TIME PASSIVE IR
TIME FIREPOND
0
TIME HAYSTACK SNN O U T P U T
SNN OUTPUT SNN OUTPUT Fig. 20. Sensor neural nets (SNNs) neuron outputs. Novel FFI canister-balloon deployment data. (A, B, C): predeployment, deployment, postdeployment neurons.
603
Decision-level neural net sensor fusion
sensor alone. This was accomplished through the use of an FNN training set distinct from the SNN training data, for which the performance of each SNN is well represented. Thus, for example, because the Firepond pre- and postdeployment images were inherently ambiguous, the FNN training set contained Firepond SNN outputs with about 40% error in pre- and postdeployment detection. This procedure was necessary in order for the FNN to learn the extent that a sensor should be ignored for a given pattern of SNN outputs. Figure 19 contains a plot of the FNN cost function C versus iteration during 1.0
--~
IX
T
E
1.0
0 E 1.0
0 TIME
Fig. 21. Fusion neural net (FNN) neuron outputs. Novel FFI canister-balloon deploymentdata. (A, B, C): Predeployment,deployment,postdeploymentneurons.
604
R. Y. Levine and T. S. Khuon
training. The one-dimensional FNN converged after about 90 iterations on a training set with about 60 input nine-vectors. The algorithm ran in approximately 20 s on the Silicon Graphics Iris Workstation. In order to test the performance of the trained sensor fusion system, a performance set of novel data from the same launch was created. The performance set contained between ten and twenty radar image pairs each from predeployment, deployment, and postdeployment. A simulated passive IR spectrum was generated for each Haystack image with added random Gaussian noise of deviation at 10% of the peak spectral value. The images and spectra were ordered sequentially in time and propagated through the sensor fusion system. Figure 20 contains the neuron Outputs of the SNNs over the performance set, with (A, B, C) corresponding to predeployment, deployment, and postdeployment neurons. Note that for the passive IR SNN the deployment and postdeployment neurons oscillated in value, reflecting ambiguity from the graybody balloon dominance of the reflective door spectrum. The Firepond SNN neurons oscillated during the pre- and postdeployment phases due to the similarity of the canister and balloon range-Doppler images. Finally, although the Haystack radar SNN had overall the best performance, there was oscillation during the deployment phase due to the loss of reflections from the ejected doors. Figure 21 contains the FNN neuron outputs for the I--
1.1 .
A 0 o
TIME
TIME
uJ
1.oo . TIME
B:
I-" Z
q~B1.oi. =. z I.U 0
o
i.°I Ft TIME
o "
TIME
o "
TIME
I-" Z ~Z
,,-t _ ~ I,o0
O o..
101 o
,0i TIME P A S S I V E IR SNN OUTPUT
o "
w TIME FIREPOND SNN OUTPUT
o "
TIME HAYSTACK SNN OUTPUT
Fig. 22. Sensorneural nets (SNNs)neuron outputs. NovelFFII payload-canisterdeploymentdata. (A, B, C): Predeployment, deployment, postdeploymentneurons.
605
Decision-level neural net sensor fusion
performance set, which clearly indicates a performance superior to any of the SNNs. This is the desired evidence of sensor synergism obtained through the fusion of multisensor data. A procedure similar to the training and performance tests described above was applied to the payload-canister deployment in Figure 12. In this case the training set was generated from the FFI launch, and the system performance was tested on data from the FFII launch. Details of the analysis will not be described, except to note that the passive IR spectrum was dominated by the large metallic payload (3.0 W/sr peak) in the predeployment and deployment phases. The radar images contained the payload during predeployment, the canister and payload during deployment, and the canister alone during the postdeployment phase. The radar SNNs therefore detected the deployment phases based on image segmentation and payload-canister size differences. The three neuron output values for each of the SNNs, from a performance set of about sixty FFII images of the canister deployment, are shown in Figure 22. The Firepond SNN performance was poor due to the lack of correct scaling for FFII and a high clutter level in the data. The difficulties in launch-to-launch 1.0
I.U no.
.I
0
TIME 1.0
IZ I~lZ :EO >.nO~ -II11 O-Z tU ¢3
T ,
J
,
I1|
~* y
TIME IZ tg
1.0
~z O0 C I-
0
o TIME
Fig. 23. Fusionneural net (FNN) neuron outputs. Novel FFII payload-canisterdeploymentdata. (A, B, C): Predeployment,deployment,postdeploymentneurons.
606
R. Y. Levine and T. S. Khuon
cross range scaling resulted from different object spin rates between FFI and FFII. The problem can be largely corrected by further post-launch image processing. Figure 23 contains the three FNN neuron outputs for the performance set of FFII data. As with the balloon deployment results in Figures 20 and 21, there is clear evidence of sensor synergism from the distinct FNN neuron outputs during the different phases.
4. Conclusion
This chapter contains theoretical and experimental examples of neural networks in a distributed sensor fusion decision-making environment. The architecture consists of sensor level decision nodes, which output a decision based only on data from the particular sensor. The multisensor decision outputs form the input to a fusion node for an overall decision. The fusion node performs cluster analysis in a multisensor hypothesis space to obtain the system decision. The theoretical analysis consisted of the application of neural nets to a benchmark problem, the detection of variance transitions in Gaussian noise, for which a classical hypothesis test is defined. In both the cases of stand-alone single sensor decision-making and multisensor fusion, the neural nets matched the performance at the classical optimum. In addition, it was shown in general that the optimum (binary decision) fusion processor, obtained from a loglikelihood test in [14], is in fact a perceptron neural net. This fact motivated the use of an adaptive network at the fusion processor in the distributed sensor fusion architecture. It was further shown that a back propagation net matched to performance of the optimum fusion processor on the variance transition detection (SXOR) test. The procedure of net training in the distributed sensor architecture, which requires separate representative training sets for the sensor and fusion nodes, was reviewed in the application to the SXOR test. It was emphasized that the training set for the FNN must contain a representative decision set from each SNN. The experimental example of decision-level neural net sensor fusion consisted of the application of the system to object deployment detection during the Firefly launch. The sensor inputs consisted of range-Doppler images from the Haystack (X-band) and Firepond (CO 2 laser) radars, as well as a passive IR spectral simulation of the tracked objects. The output decisions were the identification of predeployment, deployment, and postdeployment phases for the release of an inflatable carbon cloth balloon. The fusion neural net performed a nine-dimensional cluster analysis, three sensors with three decisions, on the output of independently trained sensor neural nets. The system was trained and performance-tested on data from the first Firefly launch for the detection of balloon deployment. In a more recent experiment, the system was applied to the detection of canister deployments using training and performance data from the first and second Firefly launches, respectively. The
Decision-level neural net sensor fusion
607
results clearly demonstrate enhanced fusion performance from the comparison of deployment detection by the fusion and sensor nets. Through the analysis of sensor ambiguities, it was shown that the fusion system employs synergism between the various sensors to provide an optimum overall decision. Decision-level sensor fusion processing is a highly relevant procedure for automated decision-making in a multiple sensor environment. This chapter demonstrates that the application of neural nets in the architecture takes full advantage of performance enhancements possible by data fusion.
Appendix A. Performance measures for adaptive decisioning systems Hypothesis testing by a data-adaptive system, such as a neural net, is fundamentally different from classical hypothesis testing. In the former a representative data set, corresponding to known hypotheses, is used to train the system. System parameters are varied until the system training set-hypothesis space mapping best approximates the known map. The assumptions of a sufficiently representative training set and the ability of the system to associate are required to extend the map to arbitrary data [21]. In contrast, classical hypothesis testing derives from an assumed model for the data, often a signal in Gaussian noise, from which optimum tests are defined [53]. In this appendix performance measures are derived based only on the procedure by which an adaptive system is trained. It is assumed that, if a system is perfectly trained on a representative data set for each hypothesis, an appropriate performance estimate is the averaged performance over the ensemble of training sets. This averaged performance, which is computed in terms of training set size and data distributions, reflects an uncertainty inherent in learning from a finite representation of the data. As discussed in the introduction, an exact measure of system performance is obtained by testing the system on an ensemble of independent performance sets. However, in order to predict this performance an exact model of the system mapping must be known. This is difficult for model based systems in general; but even more difficult for adaptive systems in which the exact mapping is training set dependent. In the following, training set based performance measures are derived for a data-adaptive system on an arbitrary data-based N-hypothesis test. A maximum a posteriori probability (MAP) test is also formulated and represented for a decisioning system with output in [0, 1]u. A possible neural net representation of the MAP test contains N output neurons. For a net input x the ith deepest layer neuron literally outputs p(//~ I x) E [0, 1], which is the conditional probability for hypothesis Hi, i = 1 , . . . , N. This rather stringent condition was obtained in [38] using a Boltzmann net to implement the MAP test. In this appendix the training set based and MAP estimates are derived for the binary hypothesis test, resulting in a comparison of the receiver operating characteristic (ROC) curves for these measures.
608
R. Y. Levine and T. S. Khuon
The performance of an adaptive system can be approximated from the statistics of the training set. Consider the training of an adaptive system for the testing of hypotheses H 1. . . . , H n with prior probabilities p ( H i ) , i = 1 , . . . , N . The prior probabilities are normalized to unity by the condition ~.,=1 u P(~)= 1. The input to the system is the data value x E ~ ~, which is obtained by the observation of stochastic phenomena reflecting the set of possible hypotheses. The integer Q represents the arbitrary dimension of the input data value x which is suppressed for notational clarity. The operation of observing the phenomena from which x is obtained is denoted OBS. The OBS-generated value x is input to the adaptive system, which has an output u = ( u ~ , . . . , UN) with uj nonzero corresponding to hypothesis Hi, j = 1 , . . . , N. Figure A1 contains a schematic of the OBS and adaptive system operations. The data value x is assumed to have a conditional probability distribution p(xlH~), i = 1,..., N , with hypothesis /-/,.. More specifically, the function p(xlH~) is the probability density that the OBS operation outputs x for phenomena satisfying hypothesis H i . The densities are normalized to unity, f ~ p ( x I Hi) dx = 1, where N C_ ~ Q is the region of allowed x-values• The N adaptive system is trained on the sets {x~l, • . • . , XMN} of OBS data outputs for each hypothesis H 1 , . . . , H u. This training set results from M 1 trials of OBS with hypothesis H~, M 2 trials of OBS with hypothesis //2, and so on to M u trials of OBS with hypothesis H u. The system is trained to exactly perform the mapping J
x~--~ j (0,...,0,
1 ,0,...,0),
i=l,...,M
r, j = I , . . . , N . (A1)
A measure of system errors due to inherent training set ambiguities is obtained from the p e r f o r m a n c e on the training set {x11,.. • , X M11 } , • .., {xL.. N - , XMN}" This intuitively represents an upper bound on averaged system performance because, in general, added errors occur due to incorrect system association on arbitrary data. To compute the training set based measures it is assumed that M~ + . . . + M N trials of OBS result in exactly the OUTPUT (i) (0 ..... 0, 1 , 0 .... 0) FOR HYPOTHESIS Hi, J = 1 .... , N
HYPOTHESIS H i, i = 1, ..., N
D A T A VALUE x DISTRIBUTION p (xlH 0, i = 1 ..... N
Fig. A1. Schematic of the OBS and adaptive system operations. Hypotheses Hi, i = 1 , . . . OBS output x, neural net output u.
, N,
Decision-level neural net sensor fusion
609
the data set {x11, . . . , XlM,} U " " U {x N, . . . . XMN N } above. For a given data point x[, i = 1 , . . . , Mj, j = 1 , . . . , N, the probability of having been generated by hypothesis Ilk, k = 1 , . . . , N, is given by
p(Hk)P(x[ [ Hk) Prob(x[, H~) = N
(A2)
Z p(Hq)p(x[]Hq) q=l
where the normalization is over the hypotheses which could have generated x iJ in the M 1 + ..- + m N trials. The system maps x[ to hypothesis Hi, so that the probability in equation (A2) contributes to the situation of a system declaration for hypothesis//j when the true hypothesis is H k. Therefore, over the set of M 1 + .. • + M N trials of OBS, the average number of Hj declarations for true hypothesis H~ is given from equation (A2) by Mj NUM(Hj, H~) = • Prob(x/, H~) i=1
= ~. i=1
P(Hk)p(x[IHk)
(A3)
N
Z p(Hq)p(x[[Hq) q=l
The probability of a system declaration of Hj for true hypothesis Hg is then given by (j, k = 1 , . . . , N),
p(Uj, i-i ) - N
1
Mj
Z
p(Hk)p(x[]Hk )
(A4)
N
Z Mp ~=' ~, p(Hq)p(x[]Hq) p=l
q=l
Note that the required normalization for the M 1 + • • • + M N trials, N r.k= 1 p(Hj, Ilk) = Mj/ 2p=lu Mp, follows from equation (A4). It is interesting to consider the average of p(Hj, Hk) over the ensemble of training sets obtained by the above procedure. Recall that x iJ in equation (A4) was obtained by the OBS operation with a fixed hypothesis Hi, indicating that the appropriate distribution for x[• is p(x[]Iti). Averaging over the values of x iJ in equation (A4), an averaged probability for hypothesis Hj declared with the true hypothesis H k is given by
(p(ttj, H~)) = yjp(Hk)pi,k,
j, k = 1 , . . . ,
N,
(A5)
where
PJ,~ =
f
p(x I H j)p(x I N
p(Hq)p(x ]Hq) q=l
dx
(A6)
R. Y. Levine and T. S. Khuon
610
and Tj is the proportion of hypothesis Hi-generated data in the training set for the adaptive system,
]lj= uMj
(A7)
Mq q=l
The joint probability in equation (A5) has factored into a training ensembledependent parameter yj and a statistics-dependent quantity p(Hk)Pj, k. An estimate of the conditional probability p(H~ ] Hi), corresponding to a decision for Hi with true hypothesis Hi, is obtained from equation (A5) by
p(Hi I H:)=
( p(H,, Hi) } N E (p(Hq, 17[])) q=l
Y~P~,J
(A8)
N
E "~qPq,j
q=l
where Pi,i and yi are given in equations (A6) and (A7), respectively. Equations (A5)-(A8) are denoted the training set based measures of system performance. A more traditional approach to system performance estimation is through the maximum a posteriori probability (MAP) test [53]. For an OBS-generated input x, the hypothesis Hj is chosen which maximizes the conditional probability p ( H k l X ) , k = 1 , . . . , N. It has been shown that a neural net, trained on sufficiently representative data, converges to the MAP test performance [38]. A mapping network for the N-hypothesis test consists of a single OBSgenerated input x, a series of hidden layers, and an N-neuron output layer. A stochastic formulation of a MAP test neural net allows a comparison with the training set based estimates in equations (A5)-(A8). The N deepest layer neurons are assumed to output only 0 or 1 in the pattern i (0,...,0,1,0,...,0),
i=I,...,N,
with probability qi(x) for input x. The above output N-vector corresponds to a decision for hypothesis H i . The net output probabilities are normalized by the condition E Nj= 1 qj(x) = 1, x E 9. The joint probability p(Hj, H k [ x) for choosing hypothesis H i with phenomena satisfying H~, assuming net input x, is given by the product qj(x)p(H k Ix). The average over input values x with a prior distribution p(x) yields P
MAPI(~.~ \'~j,
c Ilk) = J~Q qJ(x)p(Hk Ix)P(x) d x .
(A9)
611
Decision-level neural net sensor fusion
The maximum a posteriori probability (MAP) test follows on average for
qj(x)-
p(Hj l x) u
(A10)
E p(Hqlx)
q=l
Substitution of equation (A10) into equation (A9) yields, upon application of Bayes' theorem,
p(t-Zj l x) =
p(x [/-/j)p(~.) p(x) '
j= I'' " " 'N
(All)
the equation
pMAPI(Hj, Ilk) = p(Hj)p(Hk)Pj,k ,
j, k = 1 , . . . , N ,
(A12)
where Pj,k is defined in equation (A6). Comparison of equations (A5) and (A12) suggests that the MAP test estimate equals the training set based estimate if the training set satisfies the equation yj = p(Hj). This condition reflects the common sense belief that the training set should be proportioned according to the prior probabilities of the hypotheses. In fact, such proportionality is a necessary condition in proofs demonstrating Bayesian performance of multilayer perceptrons [40]. A deterministic neural net model for the MAP N-hypothesis test occurs if the N-deepest layer neurons output analog values in the range [0, 1]. As obtained in [38], assume that for net input x E @ the ith, i = 1 , . . . , N, neuron literally outputs the value p(H i [x). The MAP test then results simply from choosing the hypothesis H i corresponding to the deepest layer neuron with the largest output value. A schematic of the deterministic MAP test neural net is shown in Figure A2. In order to compute performance probabilities for this net, define regions @j, j = 1 , . . . , N, given by
= (x E
I p(Hj I x) > p(/4k I x), Vk
j}.
Assuming the regions of equal conditional probabilities,
~j,k={X~@lp(Hjlx)=p(H
klx)},
j,k=l,...,N,
GENERATED INPUT
x
p (Hjlx) •
"
•
N-NEURON OUTPUT
p (HNI x) Fig. A2. Schematic of deterministic neural net representation of MAP test. OBS-generated input x, N-hypothesis neuron output.
612
R. Y. Levine and T. S. Khuon
have zero support, we define the joint performance probability pMAP2(Hj, Hk) by the expression
p MAP2"H t, j, Hk) = p(Hk) f@j p(x I Hk) dx
(A13)
corresponding to the probability that the jth neuron output in Figure A2 is maximum for an Hk-generated input. The computation of the performance probabilities pMAP2(Hj, Hk) , j, k = 1 , . . . , N, follows from the application of Bayes' formula in equation ( A l l ) to the definitions of regions 9~j and ~j,k" As a simple example equation (A13) can be applied to the binary hypothesis test for comparison with the training set based estimates in equations (A5)(A8). Consider a training set based decision between hypothesis H 0 and H 1 with prior probabilities P0 = p(Ho) and PI = P(HO, respectively. Assume the one-dimensional output (x) from the OBS operation has conditional probabilities p(x I Hi), i = 0, 1, for phenomena satisfying hypothesis Hi. The system performance is defined by the standard conditional probabilities of detection Pd =P(H1 I H1), false alarm Pf =p(HllHo), miss Pm =P(HoIH1), and the correct H 0 identification Pcz4o= p(HolHo). Assuming a training set consisting of Ni, i = 0, 1 trials of OBS with hypothesis H,., we have from equation (A8) P~ =
"~1PI,1 , TI PI,1 q- ~OPO,1
(A14)
Pf =
'}11PI,O , TlPl,o + T0P0,0
(A15)
Pm =
TOPO,1 %P0,1 q- "YlPl,1
(A16)
and
P~uo =
')tOP0,0 , YoPo,o + YlPl,o
(A17)
where Yi = N/(No + NI), i = 0, 1, and
Pj,k =
f~
p(x I Hj)p(x I Hk) PoP(x I Ho) + PiP( x I H,) dx,
j, k = 0, 1
(A18)
with @ the region of possible x values. The binary hypothesis test is traditionally characterized by the receiver operating characteristic (ROC) curve, which is defined as the relationship between conditional detection and false alarm probabilities. Typically the unknown parameter in the test is the prior probability P0, which is absorbed into a variable decision threshold for a data-generated sufficient statistic [53].
Decision-level neural net sensor fusion
613
Equations (A14) and (A15) describe a detection and false alarm probability dependent on the prior probability (P0) and the H 0 proportion in the training set (%). In the following the R O C curve is generated by varying the prior probability P0 in training set based detection and false alarm probabilities with fixed %. A common situation, which results in the conventional NeymanPearson test, is the existence of a maximum tolerated joint false alarm probability Pf = p(H~, Ho). From equation (A5) a maximum joint false alarm probability Pfo implies an upper bound on the percentage of Hi-trials in the training set; that is~ the condition 3'1 < Pfo/PoPl,o• There is also a corresponding upper bound on the joint detection probability, Pa =p(H1, Hx), given by Pd < PfoPXPl,1/PoPa,o •
The MAP test performance measure in equation (A13) can also be applied to the binary hypothesis test. Assume that for x C ~0,1, the region of equal a posteriori probability, the test chooses between hypothesis H 0 and H a with equal probability. In this case the neural net has equal output values from the two deepest layer neurons in Figure A2. The expression in equation (A13) is easily generalized to obtain the conditional probabilities
d = OAP fo p(x ]H1) dx + ½
p(x ] H1) dx,
(A19)
+ ½f%1 p(x ] H o ) d x ,
(A20)
pmMAPZ= f~o p(x ] H1) dx + ½f%~ p(x ] HI) dx,
(121)
pMAP2 ~o = f.~o P(X ] H°) dx + ½f~o ~ P(x I H°) dx "
(A22)
1
efMAP2 = f~l
p(x
[H0)dx
0,I
and
The detection of a bias in Gaussian noise is the first-order approximation to many detection problems. The sufficient statistic for the time series (xi] i = 1 , . . . , P} is the normalized sample mean, s = (1/X/--Po-) Ei= e a xi, with conditional distribution [53],
( s2)
p(s l H o) = ~ 1 exp - 2
(A23)
and
p(s I Hi) = ~
1
exp
(
2
d)2) '
(A24)
where d-
@m o-
,
(A25)
R. Y. Levine and T. S. Khuon
614
or is the noise standard deviation and m is the bias value. The substitution of equations (A23)-(A25) into equations (A14), (A15), and (A18) yields a training set based ROC curve parameterized by the prior probability P0 and the H o proportion %. Figure A3 contains the ROC curves for d = 1.0, with Y0 in the range [0.1, 0.9], as the prior probability P0 is varied. The variation of Y0 from 0.1 to 0.9 shifts the performance probabilities from correct detection (Pd) to correct false alarm (P0 regardless of the prior probability P0. The MAP test performance of an idealized neural net, corresponding to the condition To = P0, is also shown in the figure. The performance curves in Figure A3 may be interpreted as the deviation from standard MAP performance with a training set not proportioned according to prior probabilities. An analytic comparison between performance measures in equations (A14)(A17) and the MAP estimates in equations (A19)-(A22) is obtained for the case of uniformly distributed conditional probabilities p(x [ Hi) of equal width 1.0
1.0 =0.1
INNGSETBASE:
0.8
0.8
_
7 0 ] y
MAP TEST
_
0.6
0.6
f,, _ 7
0
0.4
0.4
/
0,2
--
0 0
/ 0.2
70=0.9
I 0.2
I 0.4
I 0.6
I 0.8
0 1.0
Pf Fig. A3. Receiver operating characteristic ( R O C ) curve for a bias in Gaussian noise with d = 1.0. Training set based performance curves for % = 0 . 1 , . . . ,0.9. MAP test performance curve satisfying condition Yo = Po.
Decision-level neural net sensor fusion
615
A separated by KA. The K-factor parameterization of overlapped distributions is convenient for analysis of system discrimination performance [66]. The overlapped distribution condition corresponds to K ~ [0, 1], with K of unity for non-overlapped distributions. The training set based measures for uniform distributions are obtained from substitution into equations (A14)-(A18) with the result + (1 -
Pa =
K)pd
yaK + (1 - K ) p ,
'
3qpo(1 - K) Pf = %K + (1 - K ) p o ' Pm
-/opa(l - K )
7~g + (1 - g ) p a
(A26)
(A27) (A28)
and P¢Ho =
%[K + (1 - K)Po] YoK + (1 - K ) p o '
(A29).
The MAP test measures in equations (A19)-(A22) can also be computed analytically for uniform data distributions. Assuming P0 = Pl = 0.5 the region of equal a posteriori probability ~01 is [ A ( K - ½ ) , ½A]; and the dominant hypothesis regions are given by ~0 _-' [ - ½A, A(K - ½)] and ~1 = [½A, A(K + ½)]. Substitution of these regions into equations (A19)-(A22) with uniform conditional probabilities p ( x [ H i ) , i = 0, 1, yields
p d M= pMAP2 A P= (1 2+ cH 0
K)/2
(A30)
and pmMAP2 =
pMAP2 = (1 -- K ) / 2 .
(A31)
Note that for the case Y0 = P0 = 0.5 and Ya = Pl = 0 . 5 , equations (A26)-(A29) and (A30)-(A31) are identical, as expected for a training set proportioned according to prior probabilities. Figure A4 contains plots of P d ( K ) and Pf(K) from the training set based estimate in equations (A26)-(A29), and the MAP estimate in equations (A30) and (A31), for the binary test of uniformly distributed data. The conditional probabilities are plotted for various values of 3/0 assuming equal prior probabilities for H 0 and H 1 (P0 = Pl = 0.5). If half the training set is H0-generated, Y0 = 0.5, the training set based probabilities are linear in K and m a t c h @ e MAP estimates. The results in Figure A4 again indicate that a training set proportioned toward H1; that is, 3'1 > 3'0, increases Pd (at the expense of P0 over the MAP test estimate. The reverse situation occurs for a training set proportioned toward H 0.
616
R. Y. Levine and T. S. Khuon Y0 = 0.1 1.0 _
,
I
£
70 = 0.2
,
,/
Y0 = 0.4
,
/,
0.8
•"
IX
I
--
70=0.6
...-- " "
0.6
I
0.4
y0 =
0.9
0.2 0.0
1.0
w
m
_
J
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
0.8 t -
-
m
v^= 0.1 0.2
I~. 0.6 0.4 0.4 y0= 0.6 0.2 Y0=0.8 0.0
I 0.1
t "~0 = 0"91 0.2
0.3
0.4
I
I
I
0.5
0.6
0.7
~
~
1 0.8
I 0.9
1.0
K Fig. A4. Detection and false alarm probability versus K for binary hypothesis test. Training set based estimate with 70 = 0.1, 0.2, 0.4, 0.6, 0.8, 0.9 and M A P estimate. Prior probabilities P0 = 0.5, Pl = 0 . 5 .
In this appendix two distinct performance measures were defined for adaptive systems. Training set based estimation of system performance was derived from the statistics of the training set. These statistics are relevant if system errors reflect uncertainties inherent in the learning procedure. The measures are independent of a particular adaptive system; although it is clear that systems which perform training set map undulations are described by training set based estimates. The training set based measures were compared to the performance of a MAP test, which is easily represented in an idealized neural net. Systems trained for data biases, rather than an exact training set map, are probably best described by Bayesian performance estimates. The desired system performance has implications for neural net structure. For example, it was argued in Section 2.2 that two neurons are required in a three layer BPNN for each implemented undulation in the training set map. An adaptive system matching MAP test performance would not have this structural condition. However, training set based performance may be desir-
Decision-level neural net sensor fusion
617
able because performance probabilities are dependent on the training set. Thus, for example, a training set proportioned toward particular hypotheses increases the system performance for conditional probabilities involving those hypotheses. Of course, the adaptive system must be described by the training set based estimate for such bounds to be relevant.
Appendix B. Variance transition detection
In this appendix equations (3) and (4) relating detection and false alarm probabilities to the quantities (p(] [ m) [ ], m E {0, 1}) are derived. Recall that the indices zero and one correspond to noise deviations o-0 and o-1, respectively. The pair (i, j) denotes a transition from deviation o-i to deviation ~ , and the expression p(x[ y) denotes the probability of x detection conditioned on y. The relevant probabilities are then given by Po = p(transition [ transition) and P~ = p(transition t no transition) for detection and false alarm. The detection probability is given by p(transition [ transition) = p((1, 0) [ transition) + p((0, 1) [ transition).
(B1)
The application of Bayes' theorem to equation (B1) yields the result Pd =
p((1, 0), transition) + p((0, 1), transition) p(transition) '
(B2)
where p(transition) represents the prior probability of a transition. A transition is obtained either by a (1, 0) or a (0, 1) noise deviation pair. Equation (B2) can be written in terms of the probability for specific deviation pair detection with the result
P~ --
p((0, 1), (1, 0)) + p((0, 1), (0, 1)) p((1, 0)) + p((0, 1)) + p((1, 0), (1, 0)) + P(O, 0), (0, 1)) p((1, 0)) + p((O, 1))
(B3)
where p((i, j)) represents the prior probability of a deviation pair (i, ]). Application of Bayes' theorem to equation (B3) results in the expression Pd =
[p((0, 1) [ (1, 0)) + p((1, 0) [ (1, 0))]p((1, 0)) p((1, 0)) + p((0, 1)) +
[p((0, 1) [ (0, 1)) + p((1, 0) [ (0, a))]p((0, 1)) p((1, 0)) + p((0, 1))
(B4)
R. Y. Levine and T. S. Khuon
618
Recall that p((i, j) I(k, m)) represents the detection of deviation pair (i, j) conditioned on the pair (k, m). Assuming that the decision for this occurrence is based on a pair of maximum likelihood tests before and after the transition, the conditional probabilities factorize; that is, p((i, j) l(k, rn))= p(i I k ) p ( j l m ) . Application of this property in equation (B4) results in the expression Pa = p(1 I 1)p(O I O) + p(O I 1)p(1 I 0),
(B5)
where p(i I J) is given in equations (5) and (6). It is interesting that the prior probabilities p((i, j)) have cancelled from equation (B5); indicating an overall detection probability independent of the prior distribution of deviation pairs. The same argument applied to the false alarm probability results in the expression
Pf =
p(1 I 1)p(O [ 1)p((1, 1)) + p(110)p(O I O)p((O, o)) [p((O, o)) + p((1,1))] /2
(B6)
In this case the probability depends on the prior probabilities p((0, 0)) and p((1, 1)) for the ensemble upon which the hypothesis test is applied. An ensemble in which all deviation pairs (i, j) are equally likely results in P f : p ( 1 I 1)p(011) + p ( 0 1 0 ) p ( 1 I 0).
(a7)
Acknowledgment We would like to thank Mitch Eggers for providing References [12]-[19], and for noting as in Reference [18] that the optimum FNN is a neural net. Hitoshi Inada and Ken Schultz generated and processed the Haystack and Firepond Firefly data, respectively. Both individuals are also thanked for providing analysis of the Firefly maneuvers and details of image generation for the two radar. Mike Jordan, the author of the passive IR simulator, is thanked for providing the information necessary to run the code and aiding our interpretation of the output. The reading of the manuscript by Sun Levine and Israel Kupiec is very much appreciated.
References [1] Weaver, C. W., ed. (1988). Sensor Fusion. Proc. SPIE 931. [2] Weaver, C. W., ed. (1989). Sensor Fusion II. Proc. SPIE 1100. [3] Harney, R. C., ed. (1990). Sensor Fusion III. Proc. SPIE 1306. [4] Iyengar, S.S., R.L. Kashyap and R.N. Madan (1991). Distributed sensor networks-Introduction to the special section. IEEE Trans. Systems Man Cybernet. 21, 1027-1031.
Decision-level neural net sensor fusion
619
[5] Dasarathy, B. V. (1990). Paradigms for information processing in multisensor environments. Proc. SPIE 1306, 69-80. [6] Waltz, E. and J. Llinas (1990). Multisensor Data Fusion. Artech House, Norwood, MA. [7] Luo, R. C. and M. G. Kay (1989). Multisensor integration and fusion in intelligent systems. 1EEE Trans. Systems Man Cybernet. 19, 901-931. [8] Blackman, S. S. (1988). Theoretical approaches to data association and fusion. Proc. SPIE 931, 50-55. [9] Blackman, S. S. (1990). Association and fusion of multiple sensor data. In: Y. Bar-Shalom, ed., Multitarget-Multisensor Tracking: Advanced Applications. Artech House, Norwood, MA, 187-218. [10] Chong, C.-Y., S. Mori and K.-C. Chang (1990). Distributed multitarget multisensor tracking. In: Y. Bar-Shalom, ed., Multitarget-Multisensor Tracking: Advanced Applications. Artech House, Norwood, MA, 247-295. [11] Tucci, R. and M. J. Tsai (1990). Comparison of ROCs for various sensor fusion schemes. Proc. SPIE 1306, 81-92. [12] Tenney, R. R. and N. R. Sandell (1981). Detection with distributed sensors. IEEE Trans. Aerospace Electron. Systems 17, 501-509. [13] Sadjadi, F. A. (1986). Hypothesis testing in a distributed environment. IEEE Trans. Aerospace Electron. Systems 22, 134-137. [14] Chair, Z. and P. K. Varshney (1986). Optimal data fusion in multiple sensor detection systems. IEEE Trans. Aerospace Electron. Systems 22, 98-101. [15] Thomopoulos, S. C. A., R. Viswanathan and D. C. Bougoulias (1987). Optimal decision fusion in multiple sensor systems. IEEE Trans. Aerospace Electron. Systems 23, 644-653. [16] Thomopoulos, S. C. A., D. C. Bougoulias and L. Zhang (1988). Optimal and suboptimal distributed decision fusion. Proc. SPIE 931, 26-30. [17] Thomopoulos, S. C. A., R. Viswanathan and D. C. Bougoulias (1989). Optimal distributed decision fusion. IEEE Trans. Aerospace Electron. Systems 25, 761-765. [18] Atteson, K., M. Schrier, G. Lipson and M. Kam (1988). Distributed decision-making with learning threshold elements. In: Proc. 1EEE 27th Conf. on Decision and Control. Austin, TX, December. IEEE Press, New York, 804-805. [19] Reibman A. R. and L. W. Nolte (1988). On determining the design of fusion detection networks. In: Proc. I E E E 27th Conf. on Decision and Control., Austin, TX, December. I E E E Press, New York, 2473-2474. [20] Dasarathy, B. V. (1991). Decision fusion strategies in multi sensor environments. IEEE Trans. Systems Man Cybernet. 21, 1140-1154. [21] Hecht-Nielson, R. (1990). Neurocomputing. Addison-Wesley, Reading, MA. [22] Levine, R. Y. and T. S. Khuon (1992). Training set-based performance measures for data-adaptive decisioning systems. Proc. Spie 1766, 518-528. [23] Lippmann, R. P. (1987). An introduction to computing with neural nets. IEEE ASSP Magazine 4, 4-22. [24] Minsky, M. and S. Papert (1969). Perceptrons. MIT Press, Cambridge, MA. [25] Rumelhart, D. E., G. E. Hinton and R. J. Williams (1986). Learning internal representation by error propagation. In: D. E. Rumelhart and J. L. McClelland, eds., Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1. MIT Press, Cambridge, M A , 318-362. [26] Werbos, P. J. (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences. Doctoral Dissertation, Appl. Math. Dept., Harvard University. [27] Parker, D. B. (1985). Learning logic. Technical Report TR-47, Center for Computational Research in Economics and Management Science, MIT. [28] Parker, D. B. (1987). Optimal algorithms for adaptive networks: Second order back propagation, second order direct propagation, and second order learning. In: Proc. 1EEE Ist Internat. Conf. on Neural Nets, San Diego, CA, June. IEEE Press, New York, II593-II600. [29] Bryson, A.E. and Y.C. Ho (1975). Applied Optimal Control. Revised 2nd ed., Hemisphere New York.
620
R. Y. Levine and T. S. Khuon
[30] Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computation abilities. Proc. Nat. Acad. Sci. U.S.A. 79, 2554-2558. [31] Cohen, M. and S. Grossberg (1983). Absolute stability of global formation and parallel memory storage by competitive neural networks. I E E E Trans. Systems Man Cybernet. 13, 815-826. [32] Kohonen, T. (1988). Self-Organization and Associative Memory. 2nd ed., Springer, Berlin. [33] Huang, W. Y. and R. P. Lippmann (1987). Comparisons between Neural Net and Traditional Classifiers. In: Proc. I E E E 1st Internat. Conf. on Neural Nets, San Diego, CA, June. IEEE Press, New York, IV485-IV493. [34] Huang, W. Y. and R. P. Lippmann (1988). Neural net and traditional classifiers. In: D. Anderson, ed., Neural Information Processing Systems. American Institute of Physics, New York, 387-396. [35] Murphy, O. J. (1990). Nearest neighbor pattern classification perceptrons. Proc. I E E E 78, 1595 -1598. [36] Yau, H.-C. and M.T. Manry (1990). Iterative improvement of a Gaussian classifier. Neural Networks 3, 437-443. [37] Sethi, I. and A. K. Jain, eds. (1991). Artificial Neural Networks and Pattern Recognilion: Old and New Connections. Elsevier, New York. [38] Yair, Y. and A. Gersho (1990). Maximum aposteriori decision and evaluation of class probabilities by Boltzmann perceptron classifiers. Proc. I E E E 78, 1620-1628. [39] Perlovsky, L. I. and M. M. McManus (1991). Maximum likelihood neural nets for sensor fusion and adaptive classification. Neural Networks 4, 89-102. [40] Ruck, D. W., S. K. Rogers, M. Kabrisky, M. E. Oxley and B. W. Suter (1990). The multilayer perceptron as an approximation to a Bayes optimal discriminant function. I E E E Trans. Neural Networks 1, 296-298. [41] Wan, E. A. (1990). Neural network classification: A Bayesian interpretation. I E E E Trans. Neural Networks 1, 303-305. [42] Miyake, S. and F. Kanaya (1991). A neural network approach to a Bayesian statistical decision problem. I E E E Trans. Neural Networks 2, 538-540. [43] Richard, M. D and R. P. Lippmann (1991). Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Computation 3, 461-483. [44] Rosenblatt, F. (1962). Principles o f Neurodynamics. Spartan, New York. [45] Basseville, M. and A. Benveniste (1982). Sequential detection of abrupt changes in spectral characteristics of noise. I E E E Trans. Inform. Theory 28, 318-329. [46] Levine, R. Y. and T. S. Khuon (1991). Neural nets for distributed sensor data fusion: The Firefly experiment. In: Proc. SPIE 1611, 52-64. [47] Eggers, M. and T. S. Khuon (1990). Neural network data fusion concepts and applications. In: Proc. I E E E lnternat. Joint Conf. on Neural Nets, San Diego, June. IEEE Press, New York, II7-II16. [48] Brown, D. E., C. L. Pittard and W. N. Martin (1989). Neural net implementations of data association algorithms for sensor fusion. Proc. S H E 1100, 126-135. [49] Casasent, D. P. and T. M. Slagle (1989). Mixture neural net for multispectral imaging spectrometer processing. Proc. SP1E 1198, 324-333. [50] Gaughan, P. T., G. M. Flachs and J. B. Jordan (1991). Multisensor object segmentation using a neural net. Proc. SP1E 1469, 812-819. [51] Brown, J. R., D. Bergondy and S. Archer (1991). Comparison of neural network classifiers for sensor fusion. Proc. S H E 1469, 539-543. [52] Bowman, C. (1988). Artificial neural network adaptive systems applied to multisensor ID. In: Proc. 1988 Tri-Service Data Fusion Syrup., Laurel, MD, May. Naval Air Development Center, Warminster, PA, 162-171. [53] Van Trees, H. (1971). Detection, Estimation, and Modulation Theory, Part I. Wiley, New York. [54] Thomopoulis, S. C. A., I. N. M. Papadakis, H. Sahinoglou and N. N. Okello (1991).
Decision-level neural net sensor fusion
[55]
[56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66]
621
Centralized and distributed hypothesis testing with structured adaptive networks and perceptron-type neural networks. Proc. SPIE 1611, 35-51. Heeht-Nielson, R. (1987). Theory of back propagation neural nets. In: Proc. o f I E E E 1st lnternat. Conf. on Neural Nets, San Diego, CA, June. IEEE Press, New York, I593-I611. Cybenko, G. (1989). Approximation of superpositions of sigmoidal functions. Math. Control Signals Systems 2, 303-314. Levine, R. Y. and T. S. Khuon (1989). A comparison of neural net learning by back propagation and simulated annealing. MIT Lincoln Laboratory Memorandum. 93L-0019. Wasserman, P. D. (1989). Neural Computing: Theory and Practice. Van Nostrand Reinhold, New York. Menon, M. M. and E. J. Van Allen (1991). Automatic design of signal processors using neural networks. Proc. SP1E 1496, 322-328. Walker, J. L. (1980). Range-Doppler imaging of rotating object. I E E E Trans. Aerospace Electron. Systems 16, 23-52. Brown, W. M. and R. J. Fredricks (1969). Range-Doppler imaging with motion through resolution cells. I E E E Trans. Aerospace Electron. Systems 5, 98-102. Ausherman, D. A., A. Kozma, J. L. Walker, H. M. Jones and E. C. Poggio (1984). Developments in radar imaging. I E E E Trans. Aerospace Electron. Systems 20, 363-400. Kachelmyer, A. L. (1990). Range-Doppler imaging with a laser radar. The Lincoln Laboratory J. 3, 87-118. Jordan, M. (1990). Private communication. Schultz, K. I., et al. (1990). Private communication. Fukunaga, K. (1972). Introduction to Statistical Pattern Recognition. Academic Press, New York.