Estimation and decision fusion: A survey

Estimation and decision fusion: A survey

ARTICLE IN PRESS Neurocomputing 71 (2008) 2650– 2656 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/loc...

299KB Sizes 0 Downloads 35 Views

ARTICLE IN PRESS Neurocomputing 71 (2008) 2650– 2656

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Estimation and decision fusion: A survey Abhijit Sinha a,, Huimin Chen b, D.G. Danu a, Thia Kirubarajan a, M. Farooq c a b c

McMaster University, Hamilton, ON L8S 4K1 Canada University of New Orleans, New Orleans, LA 70148, USA Royal Military College of Canada, Kingston, ON K7K 5L0 Canada

a r t i c l e in f o

a b s t r a c t

Available online 7 May 2008

Data fusion has been applied to a large number of fields and the corresponding applications utilize numerous mathematical tools. This survey focuses on some aspects of estimation and decision fusion. In estimation fusion, we discuss the development of fusion architectures and algorithms with emphasis on the cross-correlation between local estimates from different sources. On the other hand, the techniques for decision fusion are discussed with emphasis on the classifier combining techniques. In addition, methods using neural networks for data fusion are briefly discussed. & 2008 Elsevier B.V. All rights reserved.

Keywords: Data fusion Estimation fusion Decision fusion Classifier combining

1. Introduction Initial data fusion applications were predominantly in the defense systems [30,57]. Currently, this field has expanded to cover many research topics in which data fusion is an essential component: combining and updating the mapping of gravitational anomalies and meteorological variables [58], target tracking [5], land mine detection [29], threat assessment [31], to name a few. Robotics is another research field where data fusion algorithms are extensively applied, for example, in the identification of environment [44] and navigation [38]. Medicine, geoscience and industrial engineering are some other research fields which have vast applications of data fusion. Numerous mathematical tools, such as probability theory, Bayes analysis, evidence theory, possibility theory, fuzzy-logic, neural networks and evolutionary algorithms, are applied in solving data fusion problems. Different applications may require fusion at different levels. For example, ‘‘platform-centric’’ tracking systems, where the sensors are all located on the same platform, may require direct fusion of sensory data (centralized fusion), while ‘‘networkcentric’’ tracking systems, where sensors may reside with significant distances from one another, may require fusion at the decision or estimate level (distributed fusion) to reduce the communication cost. Multiple expert systems are often combined at the decision level. Moreover, some systems may require sensory data to be fused with both discrete (decision) and continuous (estimation) outputs. In this survey, we focus on the processing of

 Corresponding author.

E-mail addresses: [email protected] (A. Sinha), [email protected] (H. Chen), [email protected] (D.G. Danu), [email protected] (T. Kirubarajan), [email protected] (M. Farooq). 0925-2312/$ - see front matter & 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2007.06.016

quantitative sensory data where the raw or processed sensor measurements and decisions are characterized by precise numbers. We present an overview of some aspects of estimation and decision fusion in Sections 2 and 3, respectively. Methods using neural networks for data fusion are discussed in Section 4. Concluding remarks are presented in Section 5.

2. Estimation fusion First of all, we present an overview of the architectures and algorithms for estimation fusion with emphasis on linear fusion schemes and solution methodologies. The estimation fusion problem can be categorized as a class of problems in which estimates of a continuous parameter/state vector obtained by different sources are to be combined to obtain an overall estimate which, in general, has better accuracy. In this survey, we consider only the estimation fusion algorithms where the underlying systems are modeled precisely. In other words, algorithms considered here assume no sensor registration error and complete information about the characteristic of measurement uncertainties. In the following, the fusion of static and dynamic random vectors are not treated separately since there is no essential difference in the linear fusion formula. However, there are some differences between the two in terms of the approaches that combine the estimates which will be discussed whenever appropriate. Before further discussion some terms of the estimation fusion problem are defined as follows. The term ‘‘raw measurement’’ refers to the measurement from any sensor at the end of its signal processing chain. The term ‘‘processed measurement’’ refers to the data after some transform of the raw measurement to be used for

ARTICLE IN PRESS A. Sinha et al. / Neurocomputing 71 (2008) 2650–2656

estimation. One of the purposes in processing the raw measurements is to compress the data and save communication bandwidth. The term ‘‘local estimate’’ refers to any estimate that uses measurements from the local platform (i.e., the platform where the sensor is located) only. A local estimate may include data from a single sensor or multiple sensors, but all inputs must be from the local platform or the data processing unit. The fusion systems use three basic approaches of communication between a local platform and the fusion center, namely:

2651

S

S F

F S

S

 sending raw measurements,  sending processed measurements, e.g., quantized measure

S

S

F

ments to satisfy the bandwidth constraint, sending local posteriors, e.g., local estimates/covariances.

S

S We will focus on the third approach since it is commonly used in the existing distributed tracking systems. To design a fusion system one needs to choose the system architecture, develop an algorithm to perform the fusion based on certain optimality criterion, and find a way to compute the possible cross-correlations among the estimation errors from different sources. Apart from these three steps, another major component is to develop a data association algorithm when there is an association ambiguity among the local estimates from different sources. Ref. [6] presented an algorithm for the association of multiple estimates in target tracking. For brevity data association from different sources for estimation fusion will not be discussed further. Interested readers may find detailed account for the measurement origin uncertainty in [5,7]. 2.1. Estimation fusion architectures Architectures for estimation fusion can be divided into two basic categories, namely: a hierarchial fusion architecture and a fully distributed fusion architecture [5,17]. Figs. 1 and 2 show examples of the two types of architectures where nodes marked as ‘‘S’’ denote sensors and nodes marked as ‘‘F’’ denote fusion centers. In the case of hierarchical architecture, local estimates obtained in the local fusion center are transmitted to the corresponding higher level fusion centers where these estimates are fused. On the other hand, in a fully distributed architecture, fusion centers do not have superior/subordinate relationship. Each local fusion center broadcasts its estimates to all other fusion centers, which, in turn, update their estimates by incorporating the new information. In the case of hierarchical architecture, the

S

S

S

S F S

F S

F

S Fig. 2. An example of fully distributed fusion architecture.

failure of a higher order fusion center makes all subordinate fusion centers unusable for fusion. A fully distributed architecture does not have this limitation and hence it produces robust systems. However, it requires higher overall computation and it is unsuitable for a system that requires single global picture. The hierarchical fusion architecture can be further divided in terms of whether or not the higher level fusion centers transmit feedback to the lower level fusion centers. Although feedbacks can differ in type of information passed, in general, it contains fused estimates [5]. In this case the lower level fusion center can be simply re-initialized by using the estimates sent by the higher level fusion center. 2.2. The correlation among local estimation errors One of the major issues in estimation fusion is the crosscorrelation among the local estimates. This may not pose severe problem in the static estimation fusion case since the only source of the cross-correlation is among the measurements from different sensors. This cross-correlation can be estimated and/or the raw measurements can be de-correlated. However, in a dynamic estimation fusion problem there are two more sources of cross-correlation, namely:

 common history of measurement errors,  common process noise. The cross-correlation due to common history of measurement errors arises as the estimates communicated by a local fusion center at different times may use a common set of measurements. Hence, the same measurement errors arrive at the fusion center at different times. On the other hand, the errors in the state transition model corresponding to different local fusion centers are often cross-correlated, which causes the ‘‘common process noise’’ to be a part of the cross-correlation among estimates. 2.3. Estimation fusion algorithms

F

S

S S

Fig. 1. An example of hierarchical fusion architecture.

The initial momentum of algorithm development for linear estimation fusion can be dated back to the late seventies when Chong [16] and Spyer [51] independently found the optimal distributed Kalman filter. Later, a unified framework has been developed in [43] summarizing major known results in linear estimation fusion. The development of estimation fusion for nonlinear systems [3,2] is rather slow mainly due to the heavy

ARTICLE IN PRESS 2652

A. Sinha et al. / Neurocomputing 71 (2008) 2650–2656

computational requirement. Here we classify the estimation fusion algorithms on the basis of their approach to handle the cross-correlation among the local estimation errors, namely:

 Algorithms that estimate and account for the cross-correlation.  Algorithms that de-correlate the local estimates.  Algorithms that assume the cross-correlation is unknown.

3.1.1. Linear opinion pool The linear opinion pool is a commonly used decision fusion technique that is convenient because of its simplicity [10]. The fusion output is evaluated as a weighted sum of the probabilities from each model [56]. Plinear ðAÞ ¼

K X

ai P i ðAÞ,

(1)

i¼1

The first type of algorithms requires computation of the exact cross-correlation [4,7,42]. In most cases the exact cross-correlation can be obtained only if certain information, for example, the local filter update times, the local filter gains and the state transition model corresponding to the local estimates, is available to the fusion center. This may require huge communication bandwidth which would diminish the advantage of distributed fusion over centralized fusion. In [47] an algorithm to compute the steady-state cross-correlation matrix is presented. In many cases an approximate cross-correlation computation in the fusion center may be an effective way to solve the problem [6,14]. The second type of algorithms attempt to de-correlate the estimates at the local fusion center [21,22]. When the decorrelated estimates are passed to the fusion center it assumes the estimates to be independent and, hence, they can be fused by using the minimum mean-squared error criterion type of algorithms. For synchronized nonlinear systems the measurement likelihoods from local processing centers can be combined at the fusion center [2]. If the measurement likelihoods are independent, this algorithm can avoid local track cross-correlation problem. There are cases when the cross-correlation cannot be estimated and consequently the local estimates cannot be decorrelated. The covariance intersection (CI) algorithm is applicable [36,37] in such a case to obtain a consistent estimate in the sense given in [34]. It should be noted that the fusion system performance would be far inferior to that of the optimal one (assuming the knowledge of cross-correlation).

3. Decision fusion Generally speaking, decision fusion performs a data-reduction mapping from multiple inputs to a smaller number of outputs. The inputs may be raw sensor data, pixel values, extracted features, signal estimators, or control signals. Outputs may be target types, recognized scenes or events, enhanced features, etc. Decision fusion usually does not assume any parametric statistical model of the outputs from local data processors as opposed to estimation fusion. An important aspect of any solution technique is the way it models uncertainty or errors in the sensor information. Ultimately, it is desired that the error of each model can be corrected by the other sensor models and vice versa. If all the models indicate the same error, i.e., they make the same mistake, then no combination can rectify the error. However, if the errors do not always correlate, performance can be improved with the proper combination.

where Plinear ðAÞ is the combined probability employing a set of models used for an event A; ai is the weight given to the i-th model; Pi ðAÞ is the probability of the i-th model for the event A; and K is the number of models. The parameters ai are generally P chosen such that 0pai p1, and i ai ¼ 1. The linear opinion pool is appealing in that the output is a probability distribution, and the weight ai provides a rough measure for the contribution of the i-th model. However, the probability distribution of the combined output, namely, P linear ðAÞ, may be multimodal [1]. 3.1.2. Log opinion pool An alternative to the linear opinion pool is the log opinion pool. P If the weights are constrained such that 0pai p1 and i ai ¼ 1, then the log opinion pool also yields a probability distribution. However, as opposed to the linear opinion pool, the output distribution of the log opinion pool is typically unimodal. The log opinion pool consists of a weighted product of the model outputs [1] Plog ðAÞ ¼

K Y

P i ðAÞai .

(2)

i¼1

Note that with this formulation, if any model assigns a probability of zero, then the combined probability is also zero. Hence, an individual model has the capability of a ‘‘veto’’, whereas in the linear opinion pool, the zero probability is averaged out with other probabilities. 3.1.3. Voting/ranking methods Another simple method for combining the results of multiple models is to use a voting procedure [8,46]. In this case, each model must generate a decision instead of a score. Numerous voting techniques have been presented in the literature [9,11,19,28]. The most popular one is the majority vote [9,46]. Other voting techniques are the maximum, minimum, and median votes. Ranking methods are appropriate for problems that involve numerous classes. Rankings do not use class labels or numerical scores, but they do utilize the order of classes as estimated by the model(s). Ranking methods employ the class set reduction method to reduce the number of class candidates without losing the true class. By reducing the number of classes and reordering the remaining classes, the true class is expected to move to the top of the ranking. The Borda count [11] is among the most popular rank-based methods. Other computational methods for decision fusion also exist such as the Dempter–Shafer method [19], and the fuzzy integral method [15,28]. These methods combine the beliefs of various models into an overall consensus belief, not just their respective decisions. 3.2. Classifier fusion

3.1. General decision fusion methods Decision fusion approaches aim at combining the beliefs of the set of models used into a single, consensus belief. In the following we review three popular decision fusion approaches, namely, the linear opinion pool, the logarithmic opinion pool, and the voting or ranking approach.

One of the most important application domain for decision fusion is pattern recognition and classification. Multiple classifier systems are often practical and effective solutions for difficult pattern recognition tasks. They are also called combining of multiple classifiers [40], the decision combination [33], the mixture of experts [35], classifier ensembles [32], classifier fusion [41],

ARTICLE IN PRESS A. Sinha et al. / Neurocomputing 71 (2008) 2650–2656

consensus aggregation [9], the dynamic classifier selection [26], hybrid methods [18], and so on. The motivation for such systems may be derived from an empirical observation that specialized classifiers are superior in different cases, or it may follow from the nature of the application. In several other cases, the motivation for using such systems is to avoid making commitments to certain initial conditions. The traditional approach to classifier selection (where we compare the available classifiers with a set of representative sample data and choose the best performer) is in contrast to classifier combining. Here, we abandon the attempt to find the best classifier, and instead, try to use all of the available ones in a smart way. The use of such an approach in order to achieve improved classification rates, compared to that achieved by a single classifier, has been widely accepted [48]. There are many ways to utilize more than one classifier in a single recognition problem. A divide-and-conquer approach isolates the types of inputs from which a specific classifier performs well, and directs these inputs accordingly. A sequential approach would use one classifier first, and invokes others only if it fails to yield a decision with sufficient confidence. Consequently, most research in classifier combining methods focuses on applying all the available classifiers in parallel to the same input, and combining their decisions, within the parallel suite. At this stage, we can discriminate between the ensemble combination and the modular combination of classifiers. The term ensemble is commonly used for combining a set of redundant classifiers. The redundancy occurs because each classifier provides a solution to the same task. Note that these solutions may be obtained by different means. This is in contrast to the modular approach in which the task is decomposed into a number of sub-tasks. Each module is concerned with finding a solution for one sub-task. To complete the whole task, each component is expected to contribute. In the sequel, we will focus on combining classifiers in an ensemble. Given a pattern space S, consisting of N mutually exclusive sets S ¼ y1 [    [ yN , where each yj ð8j 2 D ¼ f1; 2; . . . ; NgÞ represents a set of specified patterns called a class. For a sample x from S, the task of a classifier (denoted by C) is to assign x one index yj 2 D [ fN þ 1g as a label so that x is regarded as being of class yj , if jaN þ 1, with j ¼ N þ 1 denoting that x is rejected by C. Regardless of what internal structure a classifier has, or what theory it is based on, we may simply regard a classifier as a black box that receives an input sample x and outputs label yj , or in short, CðxÞ ¼ yj . Although yj is the only output information we want at the final stage of classification, many of the existing classification algorithms usually supply related information. For example, a Bayes classifier may also supply N values of posterior probabilities Pðyj jxÞ, j ¼ 1; . . . ; N for each possible label. In fact, the final label yj is the result of a maximum selected from the N values, and this selection certainly discards some information that may be useful for a multi-classifier combination. Depending on whether some output information other than one labeled as yj is used, or whether the other kind of information is used, we have different types of multi-classifier combination problems. Typically, the output information that various classification algorithms supply can be divided into the following three levels:

 The abstract level: A classifier C outputs a unique label yj or

the measurement attributed to each label, we can rank all the labels in D, according to a ranking rule (e.g., ascending or descending). By choosing the label at the top rank, or more directly, by choosing the label with the maximal value at the measurement level, we can assign a unique label to x. In other words, from the measurement level to the abstract level there is an information reduction process or abstraction process. 3.3. Classifier ensembles combining methods The main motivation for combining classifiers in redundant ensembles is that of improving their generalization capability [25]. The inherent redundancy within the ensemble can also guard against the failure of individual classifiers. The reason that we may expect a classifier to fail on certain inputs is based on the assumption that they have been trained on a limited data set. They are required, based on the training data, to estimate the target function. Unless the function is simple or the training set is a perfect representative of the data in order to achieve perfect generalization, it is inevitable that the estimate and desired target will differ. The combination of a set of imperfect estimators can be viewed as a way to manage the recognized limitations of the individual estimators. Each constituent classifier is known to make errors; however, the fact that the patterns that are misclassified by different classifiers are not necessarily the same, suggesting that the use of multiple classifiers can enhance the decision about the patterns under classification. The combination of these classifiers in such a way as to minimize the overall effect of these errors can prove useful. 3.3.1. Methods for creating ensemble members Since the main reason for combining classifiers in an ensemble is to improve their performance, there is no advantage to be gained from an ensemble that is composed of a set of identical classifiers. The emphasis here is on the similarity of the pattern of generalization. In principle, a set of classifiers can vary in terms of their weights, the time they take to converge, and even their architecture, yet constituting the same solution, since they result in the same pattern of errors when they are tested. Then, the aim is to find classifiers which generalize differently. There are a number of training parameters which can be manipulated with this goal in mind: initial conditions, the training data, the typology of the classifiers, and the training algorithm. We will review the commonly used methods which involve varying data and have been employed for the creation of ensemble members.

 Varying the set of initial random weights [13]: A set of classifiers





subset Y J from D.

 The rank level: C ranks all the labels in D (or a subset J  D) in a 

queue with the label at the top being the first choice. The measurement level: C attributes to each label in D a value for the degree that x has the label.

Among these three levels, the measurement level contains the most information, and the abstract level contains the least. From

2653



can be created by varying the initial random weights from which each classifier is trained, while maintaining the same training data. Varying the topology [32]: A set of classifiers can be created by varying the topology or architecture, while maintaining constant training data. A more intense diversity occurs if members of the ensemble are different modular systems. The errors made by the two modular systems with different internal modular structures may be uncorrelated. Varying the algorithm employed [13]: The algorithm used to train the classifiers can be varied, while holding the data constant. Varying the data [20]: The methods which seem to be most frequently used for the creation of ensembles are those which involve altering the training data. There are a number of different ways in which this can be done, including sampling data, disjoint training sets, boosting and adaptive re-sampling, different data sources, and preprocessing, or a combination of these techniques.

ARTICLE IN PRESS 2654

A. Sinha et al. / Neurocomputing 71 (2008) 2650–2656

3.3.2. Methods for combining classifiers in ensembles Once a set of classifiers has been created, an effective way of combining their outputs must be found. A variety of schemes have been proposed for combining multiple classifiers. The majority vote is by far the most popular approach. Other voting schemes include the minimum, maximum, median, average, and product schemes [8]. The weighted average approach tries to evaluate the optimal weights for various classifiers used. The behavior-knowledge space (BKS) selects the best classifier in some region of the input space, and bases its decision on its output [18]. Other approaches to combine classifiers include the rank-based methods such as the Borda count, the Bayes approach, Dempster–Shafer theory, fuzzy theory, probabilistic schemes, and combination by neural networks. We can view the combiner as a scheme to assign weights of value to classifiers. The weights can be data independent or data dependent. We can categorize the methods for classifier combining into the following groups.

 Averaging and weighted averaging: The linear opinion pools are







one of the most popular aggregation methods, and refer to the linear combination of outputs of the ensemble members’ distributions with the constraint that the resulting combination is itself a distribution. A single output can be created from a set of classifier outputs via simple averaging, or by means of a weighted average that takes into account the relative accuracies of the classifiers to be combined. Non-linear combining methods: Non-linear combining methods that have been proposed include the Dempster–Shafer beliefbased methods [19], the use of rank-based information [10], voting [8], and order statistics [54]. Supra Bayesian: The underlying philosophy of the supra Bayesian approach is that the opinions of the experts are themselves data. Therefore, the probability distribution of the experts can be combined with its own prior distribution [10]. Stacked generalization: Under stacked generalization a nonlinear estimator learns how to combine the classifiers with weights that vary over the feature space. The outputs from a set of Level 0 generalizers are used as the input to a Level 1 generalizer, which is trained to produce the appropriate output. The term stacked generalization is used by Wolpert to refer both to this method of stacking classifiers, and the method of creating a set of ensemble members by training on different partitions of the data [59]. It is also possible to view other methods of combining, such as averaging, as instances of stacking with a simple Level 1 generalizer. The same idea has been adapted in regression tasks, where it is called stacked regression [12].

4. Methods using neural networks for data fusion Neural networks have several advantages in fusing information over other approaches [53,55]

 Adaptive learning: Unlike conventional methods, which re-

 

quire sensor models to start with, neural networks can learn such models, e.g., the underlying statistical characteristics of sensing error from the incoming data. This approach is robust in case of unreliable models or dynamic change of sensor parameters. Graceful degradation: In case of damage in a neuron, neural network algorithms do not collapse because of its distributed nature of information processing. Parallel implementation: Neural networks can be developed in parallel paradigm with inexpensive hardware [50].

Neural networks have been applied to solve diverse types of fusion problems, for example, in [49] for pattern recognition by fusing information available in multiple representations, in [27] for network intrusion detection by fusing multiple classifiers, in [52] for incremental learning approach for multi-sensor data fusion, in [45] to fuse sensor information for reservoir characterization. In addition, in [60] neural network fusion strategies for identifying breast masses are compared for both balanced and imbalanced input features. In [23] different feature fusion algorithms for the recognition of handwritten Arabic literal words are presented and compared. New and emerging pattern recognition applications also demand research in applying neural networks for the design of good fusion architecture and for efficient data aggregation. Studies on the fusion algorithms using different soft computing techniques, which include neural networks, for better performance can be found in [24,39].

5. Summary and conclusions Data fusion as a major research topic has been developed with a large number of mathematical tools. It finds applications in diverse areas such as target tracking, pattern recognition, machine learning and computational intelligence. In this survey, some aspects of estimation and decision fusion have been discussed with emphasis on the choice of fusion architecture and algorithm. We reviewed important linear estimation fusion techniques that have to handle the cross-correlation of estimates from different sources. We also discussed several classifier combining techniques for decision fusion. Despite their close relationship in nature, existing formulations and solution methodologies of estimation and decision fusion have sharp distinction in the assumptions of their underlying statistical models. We envision that theoretical advances will be deeply rooted in the unification between estimation fusion and decision fusion. There are quite a few challenges for future data fusion researchers. One of such issues is the efficient combination of estimation and decision fusion, for example, in target tracking, where this can lead to a robust tracking system. Another potential area of improvement is to obtain a criterion that can balance robust (but too conservative in cases) and optimal (consistent only when certain assumptions are satisfied) fusion algorithms. References [1] L. Alexandre, A. Campihlo, M. Kamel, On combining classifiers using sum and product rules, Pattern Recognition Lett. 22 (2001) 1283–1289. [2] A.T. Alouani, Distributed estimators for nonlinear systems, IEEE Trans. Autom. Control 35 (9) (1990) 1078–1081. [3] A.T. Alouani, J.D. Birdwell, Distributed estimation: constraints on the choice of the local models, IEEE Trans. Autom. Control 33 (5) (1988) 503–506. [4] A.T. Alouani, J.E. Gray, Theory of distributed estimation using multiple asynchronous sensors, IEEE Trans. Aerosp. Electron. Syst. 41 (2) (2005) 717–722. [5] Y. Bar-Shalom, W.D. Blair (Eds.), Multitarget-Multisensor Tracking: Applications and Advances, vol. III, Artech House, 2000. [6] Y. Bar-Shalom, H. Chen, Multisensor track-to-track association for tracks with dependent errors, in: Proceedings of the IEEE CDC, Bahamas, December 2004. [7] Y. Bar-Shalom, X.R. Li, Multitarget-Multisensor Tracking: Principles and Techniques, YBS Publishing, 1995. [8] R. Battiti, A. Colla, Democracy in neural nets: voting schemes for classification, Neural Networks 7 (4) (1994) 691–707. [9] J. Benediktsson, P. Swain, Consensus theoretic classification methods, IEEE Trans. Syst. Man Cybern. 22 (4) (1992) 688–704. [10] I. Bloch, Information combination operators for data fusion: a comparative review with classification, IEEE Trans. Syst. Man Cybern.—Part A: Syst. Humans 26 (1996) 52–67. [11] J.C. Borda, Me´moire sur les e´lections au scrutin, Histoire de l’Acade´mie Royale des Sciences, Paris, 1781. [12] L. Breiman, Stacked regression, Mach. Learn. 24 (1) (1996) 49–64. [13] L. Breiman, Bagging predictors, Mach. Learn. 26 (2) (1996) 123–140.

ARTICLE IN PRESS A. Sinha et al. / Neurocomputing 71 (2008) 2650–2656

[14] H. Chen, Y. Bar-Shalom, Performance limits of track-to-track fusion versus centralized estimation: theory and application, IEEE Trans. Aerosp. Electron. Syst. 39 (2) (2003) 386–400. [15] S. Cho, J. Kim, Combining multiple neural networks by fuzzy integral for robust classification, IEEE Trans. Syst. Man Cybern. 25 (2) (1995) 380–384. [16] C.Y. Chong, Hierarchical estimation, in: Proceedings of the MIT/ONR C3 Workshop, Monterey, CA, 1979. [17] C.Y. Chong, S. Mori, W.H. Barker, K.C. Chang, Architectures and algorithms for track association and fusion, IEEE Aerosp. Electron. Syst. Mag. 15 (1) (2000). [18] B. Dasarathy, Decision Fusion, IEEE Computer Society Press, Silver Spring, MD, 1994. [19] T. Denoeux, A K-nearest neighbor classification rule based on Dempter–Shafer theory, IEEE Trans. Syst. Man Cybern. 25 (5) (1995) 804–813. [20] H. Druker, C. Cortes, L. Jackel, Y. LeCum, V. Vaprik, Boosting and other ensemble methods, Neural Comput. 6 (1994) 1289–1301. [21] O.E. Drummond, A hybrid sensor fusion algorithm architecture and tracklets, in: Proceedings of the SPIE Signal and Data Processing of Small Targets, vol. 3163, 1997. [22] O.E. Drummond, Tracklets and a hybrid fusion with process noise, in: Proceedings of the SPIE Signal and Data Processing of Small Targets, vol. 3163, 1997. [23] N. Farah, M.T. Khadir, M. Sellami, Artificial neural network fusion: application to arabic words recognition, in: European Symposium on Artificial Neural Networks, Bruges, Belgium, April 2005. [24] T. Furuhashi, Fusion of fuzzy/neuro/evolutionary computing for knowledge acquisition, Proc. IEEE 89 (9) (2001) 1266–1274. [25] S. Geman, E. Bienenstock, R. Doursat, Neural networks and the bias/variance dilemma, Neural Comput. 4 (1) (1992) 1–58. [26] G. Giacinto, F. Roli, Dynamic classifier selection based on multiple classifier behaviour, Pattern Recognition 34 (2001) 1879–1881. [27] G. Giacinto, F. Roli, L. Didaci, Fusion of multiple classifiers for intrusion detection in computer networks, Pattern Recognition Lett. 24 (2003) 1795–1803. [28] M. Grabisch, Classification by fuzzy integral: performance and tests, Fuzzy Sets Syst. 65 (1994) 255–271. [29] A.H. Gunatilaka, B.A. Baertlein, Feature-level and decision-level fusion of noncoincidently sampled sensors for land mine detection, IEEE Trans. Pattern Anal. Mach. Intell. 23 (6) (2001) 577–589. [30] D.L. Hall, J. Llinas, Handbook of Multisensor Data Fusion, CRC Press, Boca Raton, FL, 2001. [31] D.L. Hall, S.A.H. McMullen, Mathematical Techniques in Multisensor Data Fusion, second ed., Artech House, 2004. [32] L. Hansen, P. Salamon, Neural networks ensembles, IEEE Trans. Pattern Anal. Mach. Intell. 12 (10) (1990) 993–1001. [33] T. Ho, J. Hull, S. Srihari, Decision combination in multiple classifier systems, IEEE Trans. Pattern Anal. Mach. Intell. 16 (1) (1994) 66–75. [34] A.H. Jazwinski, Stochastic Processes and Filtering Theory, Academic Press, New York, 1970. [35] M. Jordon, R. Jacobs, Hierarchical mixtures of expert and the EM algorithm, Neural Comput. (1994) 181–214. [36] S. Julier, J.K. Uhlmann, A non-divergent estimation algorithm in the presence of unknown correlations, in: American Control Conference, Albuquerque, NM, USA, vol. 4, June 1997, pp. 2369–2373. [37] S. Julier, J.K. Uhlmann, General decentralized data fusion with covariance intersection, in: D.L. Hall, J. Llinas (Eds.), Handbook of Multisensor Data Fusion, CRC Press, Boca Raton, FL, 2001 (Chapter 12). [38] M. Kam, X. Zhu, P. Kalata, Sensor fusion for mobile robot navigation, Proc. IEEE 85 (1) (1997). [39] O. Kaynak, I. Rudas, Soft computing methodologies and their fusion in mechatronic products, Comput. Control Eng. J. 6 (2) (1995) 68–72. [40] J. Kittler, M. Hatef, R. Duin, J. Matas, On combining classifiers, IEEE Trans. Pattern Anal. Mach. Intell. 20 (3) (1998) 226–239. [41] L. Kuncheva, Switching between selection and fusion in combining classifiers: an experiment, IEEE Trans. Syst. Man Cybern.—Part B 32 (2) (2002) 146–156. [42] X.R. Li, Optimal linear estimation fusion—Part VII: dynamic systems, in: Proceedings of the Sixth International Conference of Information Fusion, 2003, pp. 455–462. [43] X.R. Li, Y.-M. Zhu, J. Wang, C.-Z. Han, Optimal linear estimation fusion—-Part I: unified fusion rules, IEEE Trans. Inf. Theory 49 (9) (2003) 2192–2208. [44] R.R. Murphy, Dempster–Shafer theory for sensor fusion in autonomous mobile robots, IEEE Trans. Robotics Autom. 14 (2) (1998) 197–206. [45] M. Nikravesh, Soft computing-based computational intelligent for reservoir characterization, Expert Syst. Appl. 26 (1) (2004) 19–38. [46] D. Ruta, B. Gabrys, Classifier selection for majority voting, Inf. Fusion 6 (1) (2005) 63–81. [47] R.K. Saha, K.C. Chang, An efficient algorithm for multisensor track fusion, IEEE Trans. Aerosp. Electron. Syst. 34 (1) (1998). [48] R. Schapire, The strength of weak learnability, Mach. Learn. 5 (1990) 197–227. [49] A. Schultz, H. Wechsler, Data fusion in neural networks via computational evolution, in: Proceedings of the IEEE International Conference on Neural Networks part 5, 1994, pp. 3044–3049. [50] U. Seiffert, Artificial neural networks on massively parallel computer hardware, in: ESANN’2002 Proceedings—European Symposium on Artificial Neural Networks, Bruges, Belgium, April 2002.

2655

[51] L. Spyer, Computation and transmission requirements for a decentralized linear-quadratic-gaussian control problem, IEEE Trans. Autom. Control 24 (1979) 266–269. [52] J. Su, J. Wang, Y. Xi, Incremental learning with balanced update on receptive fields for multisensor data fusion, IEEE Trans. Syst. Man Cybern. Part B 34 (1) (2004). [53] M.K. Sundareshan, F. Amoozegar, Neural network fusion capabilities for efficient implementation of tracking algorithms, Opt. Eng. 36 (3) (1997) 692–707. [54] K. Tumer, J. Ghosh, Order statistics combiners for neural classifiers, in: World Congress on Neural Networks, Washington, DC, 1995, pp. I:31–I:34. [55] S.G. Tzaiestas, Y. Anthopoulos, Neural networks based sensorial signal fusion: an application to material identification, in: Digital Signal Processing Proceedings, vol. 2, July 1997, pp. 923–926. [56] N. Ueda, Optimal linear combination of neural networks for improving classification performance, IEEE Trans. Pattern Anal. Mach. Intell. 22 (2) (2000) 207–215. [57] E.L. Waltz, J. Llinas, Multisensor Data Fusion, Artech House, Norwood, MA, 1990. [58] A.S. Willsky, M.G. Bello, D.A. Castanon, B.C. Levy, G.C. Verghese, Combining and updating of local estimates and regional maps along sets of onedimensional tracks, IEEE Trans. Autom. Control 27 (4) (1982) 799–813. [59] D. Wolpert, Stacked generalization, Neural Networks 5 (1992) 241–259. [60] Y. Wu, J. He, Y. Man, J.I. Arriba, Neural network fusion strategies for identifying breast masses, in: Proceedings of the International Joint Conference on Neural Networks, Budapest, Hungary, July 2004, pp. 2437–2442. Abhijit Sinha received his B.S. degree in physics from the University of Calcutta, India, in 1994. He received his M.S. degree in electrical communication engineering from Indian Institute of Science, Bangalore, India, in 1998 and received his Ph.D. degree in electrical and computer engineering from the University of Connecticut, USA, in 2002. He worked as a Postdoctoral Fellow at the University of Connecticut from 2002 to 2003. Currently, he is working as a research associate in McMaster University, Canada. His research interests include signal/image processing, target tracking, and communications. Huimin Chen received the B.E. and M.E. degrees from the Department of Automation, Tsinghua University, Beijing, China, in 1996 and 1998, respectively, and the Ph.D. degree from the Department of Electrical and Computer Engineering, University of Connecticut, Storrs, in 2002, all in electrical engineering. He was a post doctorate research associate at Physics and Astronomy Department, University of California, Los Angeles, and a visiting researcher with the Department of Electrical and Computer Engineering, Carnegie Mellon University from July 2002 where his research focus was on weak signal detection for single electron spin microscopy. He joined the Department of Electrical Engineering, University of New Orleans in January 2003 as an assistant professor. His research interests are in general areas of signal processing, estimation theory, and information theory with applications to target detection and target tracking. Daniel G. Danu received his B.Sc.Eng. and M.A.Sc. degrees in electronics and telecommunications engineering from ‘‘Politehnica’’ University of Timisoara, Romania, in 1994 and 1995, respectively. Since 1998 he works at Array Systems Computing Inc., Toronto, Canada, as a systems/software engineer developing real-time parallel digital signal processing systems (Synthetic Aperture Radar, Sonar). He is a licensed professional engineer within the province of Ontario. Currently he is working toward the Ph.D. degree at McMaster University, Hamilton, Canada. His research interests include target tracking, data fusion, signal processing, and real-time parallel processing systems. Thiagalingam Kirubarajan was born in Sri Lanka in 1969. He received the B.A. and M.A. degrees in electrical and information engineering from Cambridge University, England, in 1991 and 1993, and the M.S. and Ph.D. degrees in electrical engineering from the University of Connecticut, Storrs, Connecticut, in 1995 and 1998, respectively. Currently, Dr. Kirubarajan is an assistant professor in the Electrical and Computer Engineering Department at McMaster University, Hamilton, Ontario. He is also serving as an Adjunct Assistant Professor and the Associate Director of the Estimation and Signal Processing Research Laboratory

ARTICLE IN PRESS 2656

A. Sinha et al. / Neurocomputing 71 (2008) 2650–2656

at the University of Connecticut, USA. Dr. Kirubarajan’s research interests are in estimation, target tracking, multisource information fusion, sensor resource management, signal detection and fault diagnosis. He has published about 100 articles in these research areas, in addition to one book on estimation, tracking and navigation and two edited volumes. Dr. Kirubarajan’s research activities at McMaster University and at the University of Connecticut are supported by US Missile Defense Agency, US Office of Naval Research, NASA, Qualtech Systems, Inc., Raytheon Canada Ltd. and Defense Research Development Canada, Ottawa. In September 2001, Dr. Kirubarajan served in a DARPA expert panel on unattended surveillance, homeland defense and counterterrorism. He has also served as a consultant in these areas to a number of companies, including Motorola Corporation, Northrop-Grumman Corporation, Pacific-Sierra Research Corporation, Lockhead Martin Corporation, Qualtech Systems, Inc., Orincon Corporation and BAE systems. He has worked on the development of a number of engineering software programs, including BEARDAT for target localization from bearing and frequency measurements in clutter, FUSEDAT for fusion of multisensor data for tracking. He has also worked with Qualtech Systems, Inc., to develop an advanced fault diagnosis engine. He is also a recipient of Ontario Premier’s Research Excellence Award (2002).

M. Farooq received his Bsc.Eng. and M.Tech. degrees from Punjab Engineering College, Chandigarh, India and the Indian Institute of Technology, Delhi, in 1965 and 1967, respectively, and his Ph.D. from the University of New Brunswick, Canada, in 1974, all in Electrical Engineering. In March 1980, he joined the Royal Military College of Canada (RMC), Kingston, Ontario, where he served as a professor in the Department of Electrical and Computer Engineering. He has organized as well as served as technical chair on a number of conferences on Applications of Advance Technologies to Canadian Forces, and served as a CoEditor of the resulting Conference Proceedings. He was a member of the Defense Advisory Board (Canada) which drafted a report on Fusion Technology in Canada.