Signal Processing 97 (2014) 232–241
Contents lists available at ScienceDirect
Signal Processing journal homepage: www.elsevier.com/locate/sigpro
Gaussian mixture reduction based on fuzzy ART for extended target tracking Yongquan Zhang n, Hongbing Ji School of Electronic Engineering, Xidian University, P.O. Box 229, Xi’an 710071, PR China
a r t i c l e in f o
abstract
Article history: Received 28 May 2013 Received in revised form 9 October 2013 Accepted 2 November 2013 Available online 13 November 2013
This paper presents a global Gaussian mixture (GM) reduction algorithm via clustering for extended target tracking in clutter. The proposed global clustering algorithm is obtained by combining a fuzzy Adaptive Resonance Theory (ART) neural network architecture with the weighted Kullback–Leibler (KL) difference which describes discrimination of one component from another. Therefore, we call the proposed algorithm as ART-KL clustering (ART-KL-C) in the paper. The weighted KL difference is used as a category choice function of ART-KL-C, derived by considering both the KL divergence between two components and their weights. The performance of ART-KL-C is evaluated by the normalized integrated squared distance (NISD) measure, which describes the deviation between the original and reduced GM. The proposed algorithm is tested on both one-dimensional and fourdimensional simulation examples, and the results show that the proposed algorithm can more accurately approximate the original mixture and is useful in extended target tracking. & 2013 Elsevier B.V. All rights reserved.
Keywords: Global clustering algorithm Gaussian mixture reduction Weighted Kullback–Leibler difference Normalized integrated squared distance measure Extended target tracking
1. Introduction In a broad variety of signal processing and sensor fusion problems, such as target tracking, the state variables are modeled using mixtures. A mixture is a weighted sum of distributions, where the weights are non-negative coefficients. In general, the sum of all weights is one. If the sum is not equal to one, the mixture is called intensity, which usually appears in target tracking. The individual distribution of the mixture is called component; a common choice of component is the Gaussian distribution, which leads to Gaussian mixtures (GMs). As described above, Gaussian mixture models [1,2] arise naturally in solving signal processing and sensor fusion problems. Besides, Gaussian mixture models are also used in other fields, such as learning [3,4] and measuring overlapping [5].
n
Corresponding author. Tel./fax: þ 86 29 88201658. E-mail address:
[email protected] (Y. Zhang).
0165-1684/$ - see front matter & 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.sigpro.2013.11.004
In point or extended target tracking, GMs are used in the GM probability hypothesis density (GM-PHD) filters [6–9]. Besides, nonparametric belief propagation [10] can also be applied to target tracking. A point target is defined as a target which is assumed to produce at most one measurement per time step, while an extended target is defined as a target that potentially produces more than one measurement per time step. However, recursive processing of GMs generally leads to an exponential growth of the number of mixture components. In order to keep the complexity at a tractable level, the number of components must be kept at a minimum, which leads to the mixture reduction problem. Several algorithms for the GM reduction problem have been proposed in recent years. One solution is pruning, i.e., directly deleting components with low contribution to the overall mixture by some threshold. However, pruning may lead to a complete loss of useful information contained in the pruned components. A better choice is merging, i.e., successively merging components with strong similarity by a criterion to attempt to preserve some useful information.
Y. Zhang, H. Ji / Signal Processing 97 (2014) 232–241
For GM merging, the previous algorithms can be divided into top-down and bottom-up algorithms. Top-down algorithms iteratively remove components from the original mixture, including Salmond’s joining and clustering algorithms [11], Williams’ Cost-Function-Based algorithm [12], Runnalls’ KL algorithm [13], and Schieferdecker’s clustering algorithm [14]. While bottom-up algorithms start with a single component and iteratively add additional components to the reduced mixture until the original mixture density is approximated appropriately, including Huber’s progressive Gaussian mixture reduction (PGMR) algorithm [15]. In terms of the measure applied for selecting components, these algorithms can be further divided into local algorithms which consider only a subset of the available mixture information, and global algorithms which consider all available mixture information. The algorithm by Salmond [11] and West’s algorithm [16] belong to the former. The rest of the above mentioned algorithms belongs to the latter except Runnalls’ KL algorithm [13]. Runnalls tried to incorporate elements from both the local and the global algorithms. Besides, many GM reduction algorithms were presented by Crouse et al. [17], which give a nice overview of the existing literatures. The primary contribution of this paper is the development of a global clustering algorithm, called ART-KL-C, for extended target filter. ART-KL-C is inspired by the fuzzy ART [18] that is a neural network architecture. Compared with the other clustering algorithms, the fuzzy ART has a distinct merit of rapid stable learning. Therefore, for a realtime extended target tracking system, choosing ART-KL-C as GM reduction algorithm is possible. In ART-KL-C, we adopt the neural network architecture (including category choice, resonance or reset and learning) that is similar to the fuzzy ART. However, the proposed algorithm makes some modifications of the fuzzy ART in the following aspects: firstly, we use the weighted KL difference as category choice function in category choice stage. Secondly, in resonance or reset stage, a likelihood function is chosen as match function, which depicts the similarity between the input component and the chosen category. Finally, we use the parameters after merging as learning update of category parameters. A comparison of ART-KL-C with other classical reduction algorithms is made on simulation examples, and the results show that the proposed algorithm is superior to the others in terms of the NISD measure and execution time (ET). The remainder of the paper is organized as follows. Section 2 describes GM reduction problem, gives a merging criterion which is used for learning process of ARTKL-C, and summarizes the principles and dynamics of the fuzzy ART. The details of our algorithm, i.e., ART-KL-C, are given in Section 3. The simulation results are presented in Section 4. Section 5 contains the conclusions and the future work.
233
used to find a weighted GM distribution qð UÞ approximating sums of components. It will be used to derive ART-KL-C. Finally, Section 2.3 presents a detailed description of the fuzzy ART neural network, which is the theoretical basis of the proposed algorithm.
2.1. GM reduction problem In multiple extended target tracking, the target intensity in the presence of clutter and association uncertainty can be represented as a sum of weighted Gaussian distributions, i.e., N
N
i¼1
i¼1
pðxÞ ¼ ∑ wi N ðx; mi ; P i Þ ¼ ∑ wi pi ðxÞ
ð1Þ
where the dimension of the state vector is denoted as d ¼ jxj, N ð U ; m; PÞ is a Gaussian density with mean vector m and covariance matrix P which is a symmetric positive semi-definite d d matrix, and each distribution pi ð U Þ is denoted as a GM component. Note that in some target tracking frameworks, such as the probability hypothesis density (PHD) [6] and cardinalized probability hypothesis density (CPHD) [19] filters, the sum of the weights is not necessarily equal to unity, therefore, pð U Þ might not be a probability density. We call it as intensity. When we use GM to represent the posterior intensity (i.e., Eq. (1)) of linear Gaussian target dynamics, we call the filter as the GM-PHD filter [6]. For this filter, assuming that the current time posterior intensity represented by GM is known. Thus, we can obtain the next time predicted intensity via prediction equations (presented by [6]), and then the next time posterior intensity by updating the predicted intensity when the next time measurements come. According to prediction and update equations derived by [6], the number of GM (representing the next time intensity) components grows rapidly compared to the current time number of GM components by predicting and updating. Note that in the classical filters, such as Bayes filter and Kalman filter, the prediction or update is also called as the recursion of filters. Simultaneously, this filter propagates an intensity using the PHD recursion with time. Therefore, the number N of GM components grows exponentially with time. In order to keep N at a computationally tractable level, approximations become necessary. In the paper, we explore merging of GM components via clustering, i.e., using a single component to approximate some of the similar components. After merging the original GM components (1), the result is a sum N~
N~
i¼1
i¼1
~ i ; P~ i Þ ¼ ∑ w ~ ~ i N ðx; m ~ i p~ i ðxÞ: pðxÞ ¼ ∑ w
ð2Þ
where N~ rN.
2. Problem formulation 2.2. Merging criterion This section first presents GM reduction problem of the extended target tracking in Section 2.1. In this context, a target potentially gives rise to more than one measurement per time step. Then a merging criterion is given in Section 2.2, which is
This section gives a merging criterion which describes how to merge an arbitrary number of GM components into just one GM component. This is performed by analytically
234
Y. Zhang, H. Ji / Signal Processing 97 (2014) 232–241
minimizing the KL divergence Z p′ðxÞ dx dKL ðp′; qÞ ¼ p′ðxÞ log qðxÞ
ð3Þ
between p′ and q, which is a measure of how similar two functions are. Here, we do not use an integrated squared difference (ISD) similarity measure [12], since Runnalls in [13] presented that the ISD criterion may be undesirable in some applications, particularly where the system state vector has high dimensionality, such as the target tracking system. Moreover, the KL divergence is considered the optimal difference measure in a maximum likelihood sense [12–14]. Minimizing the KL divergence (3) can be rewritten as a maximization problem, i.e., Z qðxÞ ¼ arg min dKL ðp′; qÞ ¼ arg max p′ðxÞ log qðxÞ dx: ð4Þ q
q
Merging criterion: let N′
N′
i¼1
i¼1
p′ðxÞ ¼ ∑ wi N ðx; mi ; P i Þ ¼ ∑ wi p′i ðxÞ be a weighted sum of GM components, w ¼ ∑N′ i ¼ 1 wi . The solution to Eq. (4) is qðxÞ ¼ wN ðx; m; PÞ;
ð5Þ where ð6Þ
where the parameters m and P are given by 1 N′ m¼ ∑ wi mi ; wi¼1 P¼
N′
1 ∑ w ½P þðmi mÞðmi mÞT : wi¼1 i i
Fig. 1. Fuzzy ART architecture.
Categorization with the fuzzy ART is performed by category choice, resonance or reset, and learning (see Fig. 1). The details of the fuzzy ART model can be found in [18]. (1) Category choice: the category function, T j , for each input A is defined by Tj ¼
jA 4 wj j α þ jwj j
ð9Þ
where ðp 4 qÞi ¼ minðpi ; qi Þ and jpj ¼ ∑M i ¼ 1 jpi j. The Jth winner node in F2 is selected by T J ¼ max fT j : j ¼ 1; :::; Ng:
ð10Þ
ð7Þ
ð8Þ
Eqs. (7) and (8) will be used for learning process of ART-KL-C. 2.3. Fuzzy ART Adaptive resonance theory (ART) was introduced by Grossberg [20,21] in order to analyze how brain networks can autonomously learn the real-time changing world in a rapid but stable fashion. Subsequently, ART has steadily developed as a physical theory to explain and predict ever larger data bases about cognitive information processing. The theory has led to an evolving series of real-time neural networks models for unsupervised category learning and pattern recognition. The fuzzy ART is one of these models. The fuzzy ART [18] is a neural network architecture that includes an input field F0 storing the current input vector, a choice field F2 containing the active categories and a matching field F1 receiving bottom-up input from F0 and top-down input from F2. All input patterns are complement-coded by the field F0 in order to avoid category proliferation. Each input a is represented as a 2M-dimensional vector A ¼ ða; ac Þ 9 ða1 ; :::aM ; ac1 ; :::; acM Þ, where aci ¼ 1 ai ; i ¼ 1; :::; M. The weight associated with each F2 category node jðj ¼ 1; :::; NÞ is denoted as wj ¼ ðuj ; vcj Þ, which subsumes both the bottom-up and topdown weight vectors of the fuzzy ART. Initially, all weights are set to one, each category is said to be uncommitted. When a category is first coded it becomes committed. Each long-term memory (LTM) trace wi ði ¼ 1; :::; 2MÞ is monotonically nonincreasing with time and hence converges to a limit.
In particular, nodes become committed in order j ¼ 1; 2; 3; :::. (2) Resonance or reset: when the category J is chosen, a hypothesis testing is performed in order to measure the degree to which A is a fuzzy subset of wJ. Resonance occurs if the match function of the chosen category meets the vigilance criterion ρ A ½0; 1 jA 4 wJ j Zρ jAj
ð11Þ
then the chosen category is said to win (match) and learning is performed. Otherwise, the value of TJ is set to zero for the rest of this pattern presentation to prevent the persistent selection of the same category during search. A new category is then chosen by (9), and the search process continues until the chosen category satisfies (11). (3) Learning: once search is finished, the weight vector wJ is updated according to wðnewÞ ¼ βðA 4 wJðoldÞ Þ þð1 βÞwðoldÞ J J
ð12Þ
where β A ½0; 1 is the forgetting factor. Fast learning corresponds to β ¼ 1. 3. ART-KL-C for GM reduction This section presents a global clustering algorithm, called ART-KL-C, which is based on the fuzzy ART and weighted KL difference. Firstly, the derivation of the weighted KL difference is given in Section 3.1. Secondly, Section 3.2 presents a detailed description of ART-KL-C algorithm. Finally, an evaluation criterion, i.e., NISD measure, is given for GM reduction algorithm.
Y. Zhang, H. Ji / Signal Processing 97 (2014) 232–241
3.1. Weighted KL difference This section gives a derivation of the weighted KL difference which describes the distance between the two components. Since the KL divergence is asymmetrical, dKL ðpi ; pj Þ adKL ðpj ; pi Þ, it is not directly used as a distance measure. In this paper, we use the KL difference to describe the distance between the two components pi and pj , which is defined as DKL ðpi ; pj Þ ¼ dKL ðpi ; pj Þ þ dKL ðpj ; pi Þ Z Z pj ðxÞ p ðxÞ dx þ pj ðxÞ log dx: ¼ pi ðxÞ log i pj ðxÞ pi ðxÞ ð13Þ As a reasonable extension of the KL difference, the weights of the two GM components are taken into account, which leads to a weighted KL difference. In this work, we use f ℓ and f~ ℓ to represent the weighted components pℓ and p~ ℓ , respectively, namely f ℓ ¼ wℓ pℓ ; ℓ ¼ 1; …; N:
ð14Þ
~ ~ ℓ p~ ℓ ; ℓ ¼ 1; …; N: f~ ℓ ¼ w
ð15Þ
In the following equations, in order to make equations concise, we use abbreviation N ℓ of N ðx; mℓ ; P ℓ Þ to define equations for ℓ ¼ i; j: The KL divergence of f j from f i is written as Z w N ðx; mi ; P i Þ dx dKL ðf i ; f j Þ ¼ wi N i ðx; mi ; P i Þlog i i wj N j ðx; mj ; P j Þ Z w N ðx; mi ; P i Þ dx ¼ wi N i ðx; mi ; P i Þ log i þ log i N j ðx; mj ; P j Þ wj Z w N ðx; mi ; P i Þ ¼ wi log i þ dx N i ðx; mi ; P i Þlog i N j ðx; mj ; P j Þ wj w ¼ wi log i þdKL ðN i ; N j Þ ð16Þ wj where dKL ðN i ; N j Þ ¼
Pj 1 log j j dþ trðP j 1 P i Þ þ ðmi mj ÞT P j 1 ðmi mj Þ Pi 2
ð17Þ For a proof of (17), see for example, Theorem 7.2.8 of [22]. The KL divergence of f i from f j is defined analogously. Finally, by (13), (16) and (17), we can obtain a weighted KL difference wi P j 1=2 DKL ðf i ; f j Þ ¼ ðwi wj Þlog wj P i 1 þ wi trðP j 1 P i Þ þ wj trðP i 1 P j Þ 2 T
þðmi mj Þ ðwi P j
1
þ wj P i
1
235
the fuzzy ART and weighted KL difference. Similar to the fuzzy ART, ART-KL-C also includes an input field F0 storing the current input component, a choice field F2 containing the active categories and a matching field F1. Since the categories generated are merging of the similar components, the parameters associated with each F2 category ~ j and ~ j, m component p~ j ðxÞ (j ¼1,…,Ncat) are denoted as w P~ j , respectively, where Ncat is the number of categories. ~ 1 ¼ m1 and P~ 1 ¼ P 1 , and Ncat ~ 1 ¼ w1 , m Initially, N cat ¼ 1, w increases with the input components. For every input component except the first component p1 ðxÞ, ART-KL-C is performed by category choice, resonance or reset (vigilance test), and learning. 1. Category choice: in this stage, all existing categories compete to represent an input component. The choice function is a weighted KL difference between the jth category component p~ j ðxÞ and ith input component pi ðxÞ, by (18), which is represented as 0
1=2 1 P~ w i ~ j Þlog@ j A DKL ðf i ; f~ j Þ ¼ ðwi w ~ j Pi w 1 1 ~ j trðP i 1 P~ j Þ wi trðP~ j P i Þ þ w þ 2 1 ~ j ÞT ðwi P~ j þ w ~ jÞ ~ j P i 1 Þðmi m þðmi m ~ j Þd ; i ¼ 2; …; N: ð19Þ ðwi þ w
where f ℓ ðxÞ and f~ ℓ ðxÞ are the ℓth weighted input component and category component, respectively, defined by (14) and (15). The winning category J is selected by J ¼ argminðDKL ðf i ; f~ j ÞÞ:
2. Resonance or reset (vigilance test): the purpose of this stage is to determine whether the ith input component pi ðxÞ matches with the chosen category J, which is characterized by match function. In order to define the match function, we suppose that GM component pi ðxÞ can be represented by its first-order statistical moment mi . Note that this assumption is commonly used in many tracking algorithms. In the paper, we use the normalized likelihood function between mi and category cJ to define match function, namely Mðmi ; cJ Þ ¼ ð2πÞd=2 jP~ J j1=2 pðmi jcJ Þ
Þðmi mj Þ ðwi þwj Þd :
ð18Þ
ð20Þ
j
where pðmi jcJ Þ ¼
1 d=2
ð2πÞ
jP~ J j1=2
ð21Þ
1 1 exp ðmi mJ ÞT P~ J ðmi mJ Þ : 2
ð22Þ
Note that divergence and difference are different. The divergence between the two functions is defined by (3), while the difference is a sum of two divergences, e.g. Eq. (13).
Resonance occurs if the match function meets the vigilance criterion
3.2. ART-KL-C
where ρA ½0; 1 is a vigilance parameter. Learning then ensues, as defined below. Mismatch reset occurs if
This section presents the detailed description of the proposed clustering algorithm, i.e., ART-KL-C. It is based on
Mðmi ; cJ Þ Z ρ
Mðmi ; cJ Þ o ρ:
ð23Þ
ð24Þ
236
Y. Zhang, H. Ji / Signal Processing 97 (2014) 232–241
applied. In fact, it is the natural deviation measure for mixture reduction in a maximum likelihood sense [12–14]. However, in this work, the KL divergence is difficult to compute because of the logarithm of a sum, which has no simple analytical solution. Therefore, in the paper, the NISD measure is chosen as the deviation between the original and reduced GMs. Moreover, it can be computed in closed-form, resulting in a faster algorithm. In the following, a computationally feasible solution of the NISD measure is derived, which is similar to that of [12]. Expanding the NISD measure equation yields the following equation: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u R u ðp~ pÞ2 dx ~ pÞ ¼ tR DNISD ðp; R 2 p~ dxþ p2 dx vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uR 2 R R u p~ dx 2 pp ~ dxþ p2 dx : ð30Þ ¼t R R 2 p~ dxþ p2 dx
Fig. 2. ART-KL-C algorithm.
The category J is then removed from the competition for pi ðxÞ, and ART-KL-C searches for another category via (19) and (20) until finding one satisfying (23). If all existing categories fail the vigilance test, a new category characterized by the parameters of component pi ðxÞ is formed, then N cat ¼ N cat þ1:
ð25Þ
Note that the parameters of the new category compo~ Ncat ¼ mi and P~ Ncat ¼ P i . ~ Ncat ¼ wi , m nent is written as w 3. Learning: when the chosen category satisfies the vigilance criterion (23), then the parameters of the current category component are updated using the following equations, i.e., learning: ~ ðJÞ m new ¼ ðJÞ P~ new ¼
1 ðJÞ ~ old ~ Jm ðw m þ w Þ; ~J i i wi þ w
ð26Þ
h i 1 ~ ðJÞ ~ ðJÞ T wi P i þ ðmi m new Þðmi m new Þ ~ wi þ w J h ðJÞ i 1 ~ ðJÞ ~ ðJÞ ~ ðJÞ ~ ðJÞ T ~ P~ þ ðm þ w old m new Þðm old m new Þ ~ J J old wi þ w ð27Þ
R ~ dx, we can obtain Substituting (1) and (2) into pp Z Z ~ N N ~ i ; P~ i ÞN ðx; mj ; P j Þ dx: ~ dx ¼ ∑ ∑ w ~ i wj pp N ðx; m i¼1j¼1
ð31Þ In order to derive the solution of (31), we define the following standard results for Gaussian probability density functions (PDFs). Lemma 1. Given the two Gaussian PDFs, their product is defined as N ðx; mi ; P i ÞN ðx; mj ; P j Þ ¼ N ðmi ; mj ; P i þ P j ÞN ðx; m; PÞ
ð32Þ
¼ ðP i 1 þP j 1 Þ 1
where m and P are given by P and m ¼ PðP i 1 mi þ P j 1 mj Þ. The proof of Lemma 1 can be found in [24]. By Lemma 1 and (31), we can derive Z N N~ ~ i ; mj ; P~ i þ P j Þ: ~ ~ i wj N ðm ppdx ¼ ∑ ∑ w ð33Þ i¼1j¼1
R R Note that p~ 2 dx and p2 dx are derived analogously. They are Z N~ N~ ~ i; m ~ j ; P~ i þ P~ j Þ ~ iw ~ j N ðm p~ 2 dx ¼ ∑ ∑ w ð34Þ i¼1j¼1
~ J; ~ J ¼ wi þ w w
ð28Þ
Note that Eqs. (26) and (27) are obtained by (7) and (8). ART-KL-C is also illustrated by Fig. 2.
N
N
p2 dx ¼ ∑ ∑ wi wj N ðmi ; mj ; P i þ P j Þ:
ð35Þ
i¼1j¼1
Finally, substituting (33)–(35) into (30), we can obtain the NISD measure which describes the deviation between ~ p and p.
3.3. NISD measure In the paper, the NISD measure is defined by vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uZ u ðp~ pÞ2 dx ~ pÞ ¼ t R DNISD ðp; R ~p 2 dxþ p2 dx
Z
4. Simulation results ð29Þ
where p and p~ are abbreviations of Eqs. (1) and (2), respectively. The NISD measure describes the deviation ~ pÞ ¼ 0 ~ and varies from 0 to 1. DNISD ðp; between p and p, ~ pÞ ¼ 1 is the maxindicates a perfect match and DNISD ðp; imum possible error. Of course, a different deviation measure such as the KL divergence [23] could be also
In this section, we evaluate the performance of the proposed algorithm in terms of the NISD measure which is introduced in Section 3.3. To demonstrate the effectiveness of GM mixture reduction of the proposed algorithm, a onedimensional example and a four-dimensional example are considered in Sections 4.1 and 4.3, respectively; the influence of the vigilance parameter of ART-KL-C is discussed in Section 4.2; finally, a discussion is given in Section 4.4.
Y. Zhang, H. Ji / Signal Processing 97 (2014) 232–241
Note that the computational requirements for all algorithms are compared via the CPU processing time. The simulations are implemented by MATLAB on Intel Core i5-2400 3.10 GHz processor and 4GB RAM.
237
Table 1 Comparison of the NISD measures, standard deviations, maximal values, minimal values and execution times of the three algorithms for N ¼50 components reduced to the preset N~ ¼ 13 components. GM reduction algorithm
DNISD
Dmax NISD
Dmin NISD
ET (s)
ART-KL-C PA CA GMRC
0.0385 70.0106 0.053370.0152 0.0454 70.0112 0.033970.0080
0.0607 0.1164 0.0833 0.0562
0.0156 0.0189 0.0219 0.0142
0.0279 0.0074 0.0094 1.0926
4.1. One-dimensional example In the one-dimensional example, for comparison purposes, the true GM contains N A f50; 100g components, where the parameters are drawn i.i.d. from uniform distributions over the intervals wi A ½0; 1, mi A ½0; 10 and P i A ½0:252 ; 0:752 . The number of the reduced GM mixture ~ For each N, 200 Monte Carlo (MC) is denoted as N. simulation runs are performed. A comparison of the proposed algorithm with other GM mixture reduction algorithms, including Pruning Algorithm (PA) [6], Clustering Algorithm (CA) [11] and GM Reduction via Clustering (GMRC) [17], is shown in Figs. 3 and 4, and Tables 1 and 2, where ET represents execution time. Note that the compared PA, CA and GMRC are clustering algorithms, since the proposed algorithm, i.e., ART-KL-C, is a clustering algorithm. PA is similar to CA except the distance measure chosen to represent the closeness of components, and GMRC uses a greedy approach developed by Runnalls [13] to form the k initial centers, ensued by a k-means clustering using KL divergence as a distance measure. The details of PA, CA and GMRC are referred to Refs. [6,11,17]. In general, a greedy Runnalls’ Algorithm [17] is used to compare the effectiveness of reduction algorithms, which
Fig. 3. Comparison of the true (original) GM consisting of 50 components and its reduced versions consisting of 13 components resulting from ART-KL-C, PA, VA and GMRC.
Table 2 Comparison of the NISD measures, standard deviations, maximal values, minimal values and execution times of the three algorithms for N ¼100 components reduced to the preset N~ ¼ 15 components. GM reduction algorithm
DNISD
Dmax NISD
Dmin NISD
ET (s)
ART-KL-C PA CA GMRC
0.0322 7 0.0076 0.0469 7 0.0105 0.0389 7 0.0079 0.03077 0.0056
0.0612 0.0809 0.0711 0.0515
0.0171 0.0219 0.0246 0.0162
0.0556 0.0137 0.0175 8.4576
can reduce a GM with N components to a reduced mixture with N~ components, where N~ is preset. However, for clustering reduced algorithms, we can only make N~ same by adjusting their parameters, such as the merging thresholds U PA and U CA of PA and CA, k of GMRC, and vigilance parameter ρ of ART-KL-C. In the paper, for PA, CA, GMRC and ART-KL-C, their reduced number is set as N~ A f13; 15g for N A f50; 100g, the corresponding parameters are set as U PA A f1:5; 1:65g, U CA A f0:30; 0:35g, k A f13; 15g and ρ A f0:57; 0:65g. Figs. 3 and 4 show a comparison of the original GM and its reduced versions resulting from ARTKL-C, PA, CA and GMRC. The corresponding results are also shown in Tables 1 and 2. As seen from Tables 1 and 2, the NISD measure of GMRC is the lowest, and that of ART-KL-C is lower than PA and CA. These results are also shown in Figs. 3 and 4. However, GMRC algorithm requires the most ET, and its ET is much larger than the other clustering algorithms. Compared to GMRC, ART-KL-C can effectively reduce the ET and improve the effectiveness of GM reduction. Therefore, ART-KL-C is more suitable for real-time applications, such as target tracking. This is expected, since the proposed algorithm adopts a neural network architecture that is similar to the fuzzy ART having a distinct merit of rapid stable learning. Additionally, compared to PA and CA, ARTKL-C is a global clustering algorithm which is unrestricted for the initial input component, so the NISD measure of ART-KL-C is lower than PA and CA. However, the proposed algorithm requires more ET than PA and CA. This can be accepted by target tracking system, which will be shown in next section. 4.2. Influence of the vigilance parameter
Fig. 4. Comparison of the true (original) GM consisting of 100 components and its reduced versions consisting of 15 components resulting from ART-KL-C, PA, VA and GMRC.
In this section, we explore the vigilance parameter ρ of the ART-KL-C algorithm. This parameter is very important, and its choice directly affects the reduction performance of the ART-KL-C algorithm. In order to facilitate illustration,
238
Y. Zhang, H. Ji / Signal Processing 97 (2014) 232–241
we suppose that the true GM only contains N ¼ 20 components, where the parameters are drawn i.i.d. from uniform distributions over the intervals wi A ½0; 1, μi A ½0; 10 and P i A ½0:252 ; 0:752 . In simulation, the different vigilance parameters of the ART-KL-C algorithm are V defined by fρl gN , where ρl þ 1 ¼ ρl þ Δ, l ¼ 1; :::; NV 1, NV l¼1 is the number of the chosen vigilance parameters, and Δ is a step length. In this section, the corresponding values on vigilance parameters are set to ρ1 ¼ 0:01, ρNV ¼ 1, and Δ ¼ 0:03. For each ρl , 200 Monte Carlo (MC) simulation runs are performed. Fig. 5 shows that the NISD measure varies with the vigilance parameter of the ART-KL-C algorithm; Fig. 6 shows that the number of reduced GM varies with the vigilance parameter; Fig. 7 shows that the ET varies with the vigilance parameter. As seen from Figs. 5–7, the NISD measure increases with the growth of vigilance parameter, while the number of reduced GM and ET decreases with the growth of vigilance parameter. This implies that better reduction performance requires more the reduced number and ET.
Fig. 7. The ET against the vigilance parameter of the ART-KL-C algorithm.
In practice, we usually make a balance between the reduction performance and ET. 4.3. Four-dimensional example – extended target tracking In this section, we demonstrate a four-dimensional example, i.e., extended target tracking, for GM reduction. For a illustration purpose, we consider a two-dimensional scenario in clutter over the surveillance area [ 6000, 6000] [ 2000, 2000] (m) for a period of T ¼100 time steps. The sampling interval is Δt ¼ 1 s. At time k, each extended target is modeled as points with state variables of position ðxk ; yk Þ and velocity ðvx;k ; vy;k Þ, i.e., xk ¼ ½xk ; yk ; vx;k ; vy;k T :
ð36Þ
The sensor measurements are given in batches of Cartesian x and y coordinates, i.e., ¼ ½xðjÞ ; yðjÞ T : zðjÞ k k k Fig. 5. The NISD measure against the vigilance parameter of the ART-KL-C algorithm.
ð37Þ
Each extended target follows a linear Gaussian dynamical model and the sensor has a linear Gaussian measurement model, i.e., f kjk 1 ðxk jxk 1 Þ ¼ N ðxk ; Fk 1 xk 1 ; Q k 1 Þ
ð38Þ
g k ðzk jxk Þ ¼ N ðzk ; Hk xk ; Rk Þ
ð39Þ
where N ð U ; m; PÞ denotes a Gaussian density with mean m and covariance P, Fk 1 is the state transition matrix, Qk 1 is the process noise covariance, Hk is the observation matrix, and Rk is the observation noise covariance. Their values are 2 4 3 " # 3 Δt I2 Δt2 I2 I2 ΔtI2 4 5 Fk 1 ¼ ; Q k 1 ¼ s2v 4 Δt3 02 I2 Δt 2 I2 2 I2 Hk ¼ I2 02 ; R k ¼ s2ε I2 ; ð40Þ
Fig. 6. The number of reduced GM against the vigilance parameter of the ART-KL-C algorithm.
where In and On denote the n n identity and zero matrices, respectively, and sv ¼ 2 m/s2 is the standard deviation of the process noise and sε ¼ 30 m is the standard deviation of the measurements noise. In all simulations, both the probability of survival PS and the probability of detection PD are set to 0.99. Each target is
Y. Zhang, H. Ji / Signal Processing 97 (2014) 232–241
generated with a Poisson rate of 10 measurements per scan. Clutter can be modeled as a Poisson random finite sets (RFS) with intensity κ k ðzÞ ¼ λc VuðzÞ, where uð U Þ is the uniform density over the surveillance region, i.e., [ 6000, 6000] [ 2000, 2000] (m), thus V ¼ ð6000 þ 6000Þ ð2000 þ 2000Þ ¼ 4:8 107 m2 is the “volume” (namely, area) of the surveillance region, λc ¼ 2:08 10 7 m 2 is the average number of clutter returns per unit volume, and 10 clutter returns are then obtained over the surveillance region per time step. The birth intensity in the simulations is Db ðxÞ ¼ 0:1N ðx; mð1Þ ; P b Þ þ0:1N ðx; mbð2Þ ; P b Þ b
ð41Þ
¼ ½ 6000; 500; 0; 0T , mbð2Þ ¼ ½ 6000; 500; 0; where mð1Þ b 0T , and P b ¼ diagð½1002 ; 1002 ; 252 ; 252 Þ. Fig. 8 plots the true target trajectories. The two extended targets are born at 1 s and die at 100 s. Since PA is generally used for GM reduction in existing target tracking literature [6,8,25], it is chosen as the comparison algorithm with ART-KL-C in this section. According to the empirical simulations with good target tracking results, the merging threshold of PA and vigilance parameter of ART-KL-C are set as U PA ¼ 4 and ρ ¼ 0:3, respectively.
To verify the effectiveness of ART-KL-C, 500 Monte Carlo (MC) simulations are performed, each with independent measurement noise and clutter measurements. The results are shown in Figs. 9 and 10, which show the cardinality and the average OSPA [26] distances, respectively. Here, the cardinality is computed as ∑N j ¼ 1 wj . This sum can be rounded to obtain an integer estimate of target number. Compared to the previous miss-distances used in multitarget tracking, the OSPA distance overcomes the limitations including inconsistent behavior and the lack of a meaningful physical interpretation if the cardinalities of the two finite sets under consideration differ. It allows for a natural physical interpretation even if the two sets’ cardinalities are not the same, and does not exhibit any elements of arbitrariness inherent in ad hoc assignment approaches [26]. The OSPA distance [26] is defined by
OSPA distance
50
y position
ðcÞ
π A ∏n i ¼ 1
70
1500
0
!!1=p
m
min ∑ d ðxi ; x^ πðiÞ Þp þ cp ðn mÞ ð42Þ
60
500
1 n
ðcÞ ^ ¼ dp ðX; XÞ
2000
1000
239
PA ART-KL-C
40 30 20
-500
10
-1000
0
-1500
0
10
20
30
40
50
60
70
80
90
100
Time, s
-2000 -6000
-4000
-2000
0
2000
4000
6000
x position
Fig. 10. The 500 MC run average of OSPA distances against time for the two reduction algorithms.
Fig. 8. True target trajectories. 0.015
2.2
PA ART-KL-C
2 Computation time, s
Sum of weights
1.8 1.6 1.4
0.01
0.005
1.2 PA ART-KL-C True number
1 0.8 0
10
20
30
40
50
60
70
80
90
100
Time, s
Fig. 9. The 500 MC run average of cardinality against time for the two reduction algorithms.
0
0
10
20
30
40
50
60
70
80
90
100
Time, s
Fig. 11. The 500 MC run average of computation time against time for the two reduction algorithms.
240
Y. Zhang, H. Ji / Signal Processing 97 (2014) 232–241
where m r n, X ¼ fx1 ; …; xm g and X^ ¼ fx^ 1 ; …; x^ n g are the actual and estimated multi-target state sets, and the order p and cut-off c satisfy 1 rp o 1 and c 4 0. If m 4n, ðcÞ ^ ¼ dðcÞ ðX; ^ XÞ. Π n denotes the set of permutations dp ðX; XÞ p on f1; 2; :::; ng. In our simulation, the parameters of the OSPA distance are set as p ¼ 2 and c ¼ 100. As seen from Fig. 9, for the two-dimensional scenario, the average cardinality estimated by ART-KL-C is closer to the true value than PA. This is verified by Fig. 10 which shows that the average OSPA distance of the proposed algorithm is smaller than that of PA. This is due to the fact that ART-KL-C is a global clustering algorithm which is unrestricted for the initial input component. However, ART-KL-C requires slightly more computation time than PA, as shown Fig. 11. This can be accepted by target tracking system. By calculating, the extended target Gaussian mixture PHD (ET-GM-PHD) filter using ART-KL-C requires 0.7234 s on average for one MC run, while the ET-GM-PHD filter using PA requires 0.1525 s. However, two reduction algorithms cannot essentially solve the problem with underestimation of cardinality in situations where two extended targets are spatially close. For this problem, Granström et al. [8] have developed distance partitioning with sub-partitioning, and we also have proposed ART partitioning with sub-partitioning [25]. The details of the two sub-partitioning algorithms are not given here due to space considerations and the reader is referred to [8,25]. 4.4. Discussion In simulation, we can use the arbitrary GM component to initialize ART-KL-C. Since ART-KL-C is a global clustering algorithm, its performance does not depend on the initial ~ 1 ¼ m1 ~ 1 ¼ w1 , m values. For example, initially, N cat ¼ 1, w and P~ 1 ¼ P 1 . In addition, the vigilance parameter is an empirical value. This parameter is very important, and its choice directly affects the reduction performance of ARTKL-C. Similar to the fuzzy ART, smaller vigilance parameters lead to larger categories but a fewer number of categories. This implies that better reduction performance requires larger vigilance parameter and more ET. In practice, we usually make a balance between the reduction performance and ET. Note that “category” of ART-KL-C denotes the reduced GM component, thus the number of categories is that of the reduced components. 5. Conclusions and future work For GM reduction problem, a global clustering algorithm, called ART-KL-C, is proposed in this paper, which combines the fuzzy ART with weighted KL difference. The proposed algorithm adopts a merging criterion to learning category parameters, which can more accurately depict the characteristics of category. The simulation results show that the NISD measure of ART-KL-C is lower than PA and CA, but higher than GMRC. However, GMRC requires much larger ET than the other compared algorithms, thus it does not apply to the real-time systems, such as extended target tracking. Therefore, as a balance, ART-KL-C is a better choice. This is also verified by Section 4.2.
Recently, Gaussian inverse Wishart (GIW) densities have been introduced as a representation of a PHD filter [27], which is a promising filter for the multiple extended targets tracking. For GIW mixture reduction problem, Granström et al. have given solutions in [27,28]. However, they do not develop an evaluation criterion on reduction algorithms; consequently we derive a global difference criterion, i.e., the NISD measure, for GIW mixture reduction [29]. The future work is focused on the reduction of GIW mixtures. Since the reduction algorithms presented by [27,28] are the local merging algorithms, in future work, we plan to derive a global merging algorithm that is similar to the proposed algorithm in the paper.
Acknowledgments This work was supported by the National Natural Science Foundation of China under Grant no. 61372003. The authors would like to acknowledge the anonymous reviewers and handling editor for their valuable suggestions and comments. References [1] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, 2nd ed. Wiley Interscience, London, 2001. [2] C.M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), 1st ed. Springer, 2007. [3] A. Bouchachia, C. Vanaret, Incremental learning based on growing Gaussian mixture models, in: Proceedings of the IEEE ICMLA 2011 Conference, Honolulu, Hawaii, 2011, pp. 47–52. [4] A. Declercq, J.H. Piater, Online learning of Gaussian mixture models— a two-level approach, in: VISAPP 2008, Funchal, Portugal, 2008, pp. 605–611. [5] H. Sun, S. Wang, Measuring the component overlapping in the Gaussian mixture model, Data Mining Knowl. Discovery 23 (3) (2011) 479–502. [6] B.N. Vo, W.K. Ma, The Gaussian mixture probability hypothesis density filter, IEEE Trans. Signal Process. 54 (11) (2006) 4091–4104. [7] K. Granström, C. Lundquist, U. Orguner, A Gaussian mixture PHD filter for extended target tracking, in: Proceedings of the 13th International Conference on Information Fusion, Edinburgh, Scotland, July 26–29, 2010, pp. 1–8. [8] K. Granström, C. Lundquist, U. Orguner, Extended target tracking using a Gaussian-mixture PHD filter, IEEE. Trans. Aerosp. Electron. Syst. 48 (4) (2012) 3268–3286. [9] W. Li, Y. Jia, J. Du, F. Yu, Gaussian mixture PHD filter for multi-sensor multi-target tracking with registration errors, Signal Process. 93 (1) (2013) 86–99. [10] Z.M. Chen, Efficient multi-target tracking using graphical models (master’s thesis), Massachusetts Institute of Technology, 2008. [11] D.J. Salmond, Mixture reduction algorithms for point and extended object tracking in clutter, IEEE. Trans. Aerosp. Electron. Syst. 45 (2) (2009) 667–686. [12] J.L. Williams, P.S. Maybeck, Cost-function-based Gaussian mixture reduction for target tracking, in: Proceedings of the Sixth International Conference on Information Fusion, Cairns, Queensland, Australia, July, 2003, pp. 1047–1054. [13] A.R. Runnalls, Kullback–Leibler approach to Gaussian mixture reduction, IEEE. Trans. Aerosp. Electron. Syst. 43 (3) (2007) 989–999. [14] D. Schieferdecker, M.F. Huber, Gaussian mixture reduction via clustering, in: Proceedings of the 12th International Conference on Information Fusion, Seattle, WA, USA, July 6–9, 2009, pp. 1536–1543. [15] M. Huber, U. Hanebeck, Progressive Gaussian mixture reduction, in: Proceedings of the 11th International Conference on Information Fusion, Cologne, Germany, June 30–July 3, 2008, pp. 1–8. [16] M. West, Approximating posterior distributions by mixtures, J. R. Statist. Soc. B 55 (2) (1993) 409–422. [17] D.F. Crouse, P. Willett, K. Pattipati, L. Svensson, A look at Gaussian mixture reduction algorithms, in: Proceedings of the 14th International Conference on Information Fusion, Chicago, IL, USA, July 5–8, 2011.
Y. Zhang, H. Ji / Signal Processing 97 (2014) 232–241
[18] G.A. Carpenter, S. Grossberg, D.B. Rosen, Fuzzy ART: fast stable learning and categorization of analog patterns by an adaptive resonance system, Neural Networks 4 (1) (1991) 759–771. [19] M. Ulmke, O. Erdinc, P. Willett, Gaussian mixture cardinalized PHD filter for ground moving target tracking, in: Proceedings of the 10th International Conference on Information Fusion, Quebec City, Canada, July 9–12, 2007, pp. 1–8. [20] S. Grossberg, Adaptive pattern classification and universal recoding, I: parallel development and coding of neural feature detectors, Biol. Cybern. 23 (1976) 121–134. [21] S. Grossberg, Adaptive pattern classification and universal recoding, II: feedback, expectation, olfaction, and illusions, Biol. Cybern. 23 (1976) 187–202. [22] R.E. Blahut, Principles and Practice of Information Theory, AddisonWesley, Reading, MA, 1987. [23] S. Kullback, R.A. Leibler, On information and sufficiency, Ann. Math. Stat. 22 (1) (1951) 79–86.
241
[24] Williams, L. Jason, Gaussian mixture reduction for tracking multiple maneuvering targets in clutter (M.S.E.E. thesis), Air Force Institute of Technology, Wright-Patterson Air Force Base, OH, 2003. [25] Y.Q. Zhang, H.B. Ji, A novel fast partitioning algorithm for extended target tracking using a Gaussian mixture PHD filter, Signal Process. 93 (11) (2013) 2975–2985. [26] D. Schuhmacher, B.T. Vo, B.N. Vo, A consistent metric for performance evaluation of multi-object filters, IEEE Trans. Signal Process. 86 (8) (2008) 3447–3457. [27] K. Granström, U. Orguner, A PHD filter for tracking multiple extended targets using random matrices, IEEE Trans. Signal Process. 60 (11) (2012) 5657–5671. [28] K. Granström, U. Orguner, On the reduction of Gaussian inverse Wishart mixtures, in: Proceedings of the 15th International Conference on Information Fusion, Singapore, July 9–12, 2012, pp. 2162–2169. [29] Y.Q. Zhang, H.B. Ji, A global difference measure for the reduction of Gaussian inverse Wishart mixtures, Signal Process. 93 (11) (2013) 3133–3142.