Pattern Recognition Letters 27 (2006) 838–842 www.elsevier.com/locate/patrec
Dynamical Gaussian mixture model for tracking elliptical living objects Guanglei Xiong, Chao Feng, Liang Ji
*
Department of Automation, National Laboratory for Information Science and Technology, Tsinghua University, Haidian District, Beijing 100084, China Received 30 May 2005; received in revised form 19 October 2005 Available online 7 February 2006 Communicated by Dr. H. Sako
Abstract In this letter, we present a novel dynamical Gaussian mixture model (DGMM) for tracking elliptical living objects in video frames. The parameters, which inform object position and shape, are estimated by using a traditional Gaussian mixture model (GMM) for the first frame. Instead of simply using the parameters of one frame as the initial state for the next frame, DGMM employs Kalman filter to predict these parameters to initialize the evolution of GMM for the rest of sequel frames. Compared with the simple way, the estimate by Kalman filter is usually much closer to the true state, the computation load for tracking is reduced significantly. In addition, ‘‘split’’, ‘‘merge’’ and ‘‘delete’’ operations are integrated into DGMM to account for object splitting, merging and vanishing. This scheme can suppress the chance of getting stuck at a local maximum, led by a mismatch between the number of objects in the model and the actual number of objects. Results obtained from experiments on both simulated and real-world data demonstrate the effectiveness of our algorithm. DGMM has been successfully applied to track the early development process of urchin embryo in which the events of motility and cleavage occur. 2005 Elsevier B.V. All rights reserved. Keywords: Gaussian mixture model; EM algorithm; Kalman filtering; Tracking
1. Introduction Gaussian mixture model (GMM) (Dempster et al., 1977) plays an important role in clustering, density estimation and pattern recognition due to its analytical tractability, asymptotic properties, and ease of implementation. Most applications assume the underlying probability density of sample data to be fixed and iterations are performed to obtain proper parameters of the model based on some criteria. We found, in our preliminary experiments, that the converged GMM could re-converge quickly if the samples changed only slightly. This implies that the re-convergence property is potentially useful for tracking problems. When
*
Corresponding author. Tel.: +86 106 278 5787; fax: +86 106 279 5871. E-mail address:
[email protected] (L. Ji).
0167-8655/$ - see front matter 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2005.11.015
dealing with tracking, however, samples are different from frame to frame. Some parameter updating techniques are thus necessary to the basic GMM. Several authors have proposed adaptive expectation– maximization (EM) algorithms for GMM in which certain parameter updating schemes are included. There are two approaches to achieve adaptation. One is parameter based in which parameter updating functions are derived (Oliver et al., 1997; McKenna et al., 1999; Rares and Reinders, 2000). The other is non-parameter based which incorporates Markov chain Monte Carlo (MCMC) based particle filter (Vermaak et al., 2003). However, these models are not designed for tracking. The method we propose here is parameter based and well suits tracking clusters of elliptical objects. Firstly, the parameters, which inform object position and shape, are estimated by using a traditional GMM to initialize the
G. Xiong et al. / Pattern Recognition Letters 27 (2006) 838–842
model. Instead of simply using the parameters of one frame as the initial state of the next frame, DGMM employs Kalman filter to predict these parameters to initialize the evolution of GMM for the sequel frames. In addition, ‘‘split’’, ‘‘merge’’ and ‘‘delete’’ operations are integrated into DGMM to account for object splitting, merging and vanishing. The advantage of our algorithm is twofold. Due to the inclusion of ‘‘split’’, ‘‘merge’’ and ‘‘delete’’ operation, the chance to be stuck at a local maximum, led by a mismatch between the number of objects in the model and the actual number of objects, is greatly reduced. Moreover, because the estimation by Kalman filter is usually close to the true state, the iterations for one single frame are dramatically decreased. Therefore, our algorithm is faster than the simple way. The remainder of the letter is organized as follows: In the next section we briefly review the GMM approach and present three operations adopted. Section 3 describes motion parameterization and the estimation of them by Kalman filtering. Overall algorithm is summarized in Section 4. Synthetic and real-world experiment results are shown in Section 5. Section 6 concludes the paper and discusses some future works. 2. Gaussian mixture model 2.1. Basic Gaussian mixture model A random variable x is said to follow a Gaussian mixture model, if its probability density function can be formulated as pðx; HÞ ¼
k X
am pðx; hm Þ;
m¼1
where a1, a2, . . . , ak are the weights subject to am > 0 and Pk m¼1 am ¼ 1. Each component’s density p(x; hm) is a normal probability distribution 1 1 T 1 pðx; hm Þ ¼ exp Þ R ðx l Þ ; ðx l m m m 2 ð2pÞd=2 detðRm Þ1=2
• E-step The posterior probability of sample xi at the tth step is computed as ðtÞ
log pðX; HÞ ¼ log
n Y i¼1
pðxi ; HÞ ¼
n X i¼1
log
k X
am pðxi ; hm Þ.
m¼1
Because the maximum likelihood estimate of H cannot be solved analytically, EM has been widely used to estimate the parameters of GMM. It is an iterative algorithm, and performs the E-steps and M-steps alternatively as follows:
ðtÞ
a pðxi ; hl Þ pðljxi ; HðtÞ Þ ¼ Xk l . ðtÞ ðtÞ a m pðxi ; hm Þ m¼1 • M-step The (t + 1)th step H(t+1) is updated by n 1X pðmjxi ; HðtÞ Þ; amðtþ1Þ ¼ n i¼1 Xn xi pðmjxi ; HðtÞ Þ ðtþ1Þ ; lm ¼ Xi¼1 n ðtÞ pðmjx i; H Þ i¼1 h i Xn ðtÞ ðtþ1Þ ðtþ1Þ T pðmjx Þðx Þ i ; H Þ ðxi lm i lm i¼1 Xn . Rmtþ1 ¼ pðmjxi ; HðtÞ Þ i¼1
2.2. ‘‘Split’’, ‘‘merge’’ and ‘‘delete’’ operations To account for object splitting, merging and vanishing, and to prevent the model from converging to a local maximum, led by a mismatch between the number of objects in the model and the actual number of objects, we include three operations and one model selection criterion. Among several criterions already proposed, such as Bayesian inference criterion (BIC), minimum description length (MDL), Laplace-empirical criterion (LEC), integrated classification likelihood (ICL) and minimum message length (MML), Figueiredo and Jain (2002) has testified that MML is the best by experiment. Thus, we use the MML criterion, which can be expressed as ^ ¼ arg max LðH; XÞ; H H
where LðH; XÞ ¼ log pðX; HÞ
where hm = {lm, Rm} represents the mth component center and covariance, respectively. The complete set of parameters to define the mixture is H = {a1, . . . , ak, l1, . . . , lk, R1, . . . , Rk}. Given a group of n independent and identically distribution samples X ¼ fx1 ; x2 ; . . . ; xn g, the log-likelihood corresponding to a k-component mixture is
839
k na N X m log 2 m¼1 12
k n kðN þ 1Þ log ; 2 12 2
and N is the dimensionality of hm. The model which maximizes LðH; XÞ will be chosen to describe the status of objects in current frame. As in (Ueda et al., 2000), Local Kullback divergence is used to measure the distance between the local data density fm(x) and the model density pm(x) of the mth component. n X fm ðxi ; HÞ J ðm; HÞ ¼ ; fm ðxi ; HÞ log pm ðxi ; hm Þ i¼1 where pðmjx; HÞ fm ðx; HÞ ¼ Xn . pðmjx i ; HÞ i¼1 We check whether the following three operations can increase LðH; XÞ repeatedly:
840
G. Xiong et al. / Pattern Recognition Letters 27 (2006) 838–842
• ‘‘Split’’: The component (with superscript ‘‘(s)’’) which maximizes J(m; H) is potential to be split. The initial parameters of two new components indicating by subscript ‘‘1, 2’’ are set as follows: a1 ¼ a2 ¼ aðsÞ =2; l1 ¼ lðsÞ þ mðsÞ =2; R1 ¼ rðsÞ I;
l2 ¼ lðsÞ mðsÞ =2;
R2 ¼ rðsÞ I;
where m(s) is the normalized major axis of R(s), r(s) is the smallest singular value of R(s) and I is identity matrix. The idea behind the above formula is simple: We split in the direction perpendicular to the principle axis. • ‘‘Merge’’: The pair of components (with subscript ‘‘1, 2’’) whose composition (with superscript ‘‘(m)’’) minimizes J(m; H) is potential to be merged. The parameters of the composed component are set as follows (Zhang et al., 2004): aðmÞ ¼ a1 þ a2 ; lðmÞ ¼ ða1 l1 þ a2 l2 Þ=aðmÞ ; n h i T RðmÞ ¼ a1 R1 þ ðl1 lðmÞ Þðl1 lðmÞ Þ h io. þa2 R2 þ ðl2 lðmÞ Þðl2 lðmÞ ÞT aðmÞ . • ‘‘Delete’’: The component with superscript ‘‘(d)’’ is deleted if its weight is too small, i.e., a
ðdÞ
< N =n.
If any operation can increase LðH; XÞ, the model is updated, otherwise previous model is recovered.
3. Motion estimation by Kalman filtering A simple method for updating parameters is using the model in previous frame as the initial model for the next frame. When the objects move or reshape rapidly, it has two disadvantages. One is the iteration number for a single frame is often large. The other is sometimes the model is stuck in a local minimum and it does not describe objects well. Therefore, we use Kalman filtering to predict the next state of the objects and the prediction will be used as the initial parameters of the next frame. Because the estimate by Kalman filter is usually close to the true state, the computation load of the tracking is reduced significantly. Kalman filter is a well known and common tool for stochastic estimation from noisy measurement. It is actually a set of recursive mathematical equations that implement a predictor–corrector type estimator. The estimator is optimal in the sense of minimizing the estimated error covariance. Due to its simplicity, effectiveness, and optimal properties, Kalman filter has been successfully applied in a wide range of areas such as aerospace, navigation, radar, target tracking, etc. Herein, we employ it to predict the initial parameters for the next frame from the previous frame.
Assume states satisfy a linear time-invariant model:
xðk þ 1Þ ¼ UxðkÞ þ uðkÞ; yðkÞ ¼ HxðkÞ þ vðkÞ;
where x(k) and y(k) denote state vector and measurement vector at time k, respectively; u(k) and v(k) are Gaussian white noise with zero mean and covariance Q and R, respectively; U is a matrix relating x(k) to x(k + 1); H is a matrix relating states and measurements. Then the onestep predicted estimates form of the Kalman filter can be described as follows (Kailath, 1981): 1. Input the initial error covariance matrix P(0j1) = P0, which is usually set to be identity. 2. Input the initial states value x(0j1) = 0. 3. Compute the Kalman gain K(k) and G(k): GðkÞ ¼ HPðkjk 1ÞH T þ R; KðkÞ ¼ Pðkjk 1ÞH T . ^ðk þ 1jkÞ: 4. Compute the estimated state vector x 1 ^ðk þ 1jkÞ ¼ U^ x xðkjk 1Þ þ UKðkÞGðkÞ ^ðkjk 1ÞÞ. ðyðkÞ H x
5. Update the error covariance matrix P(k + 1jk): Pðk þ 1jkÞ ¼ UPðkjk 1ÞUT þ Q 1
T
UKðkÞGsðkÞ KðkÞ UT . 6. Return Step 2. For the elliptical objects, the parameters hm = {lm, Rm} are able to encode all the information of an object. That is to say, lm informs its position and Rm informs its shape and orientation. They are the state variables to be estimated by Kalman filter. Assume the objects do not interact with each other, so we can estimate for each component independently. The vector lm is suitable for Kalman filtering, but in the two-dimensional (2D) case the matrix Rm needs parameterization as follows: T rx cos h sin h cos h sin h R¼ ; ry sin h cos h sin h cos h where h gives orientation and ½ rx ry T informs shape. By combining parameters from lm and Rm, the component’s states to be estimated are ½ lx ly h rx ry T . 4. Dynamical Gaussian mixture model We summarize the DGMM model in Fig. 1. State estimate: The initial state of each component for the next frame is estimated using Kalman filtering. State update: The renewed states in GMM are fed to Kalman filter to update their current values. If there any splitting or merging occurs in GMM, newly
G. Xiong et al. / Pattern Recognition Letters 27 (2006) 838–842
841
a1 ¼ 0:2; a2 ¼ 0:3; a3 ¼ 0:2; a4 ¼ 0:3; l1 ¼ ½ 0:2 0:4 T ; l2 ¼ ½ 0:9 þ v2 t 0:6 T ; l3 ¼ ½ 0:8
T
0:4 ;
l4 ¼ ½ 0:5
T
0:1 þ v4 t ;
R1 ¼ R2 ¼ R3 ¼ R4 ¼ diag½ 0:001 0:001 ; t ¼ 1; 2; . . . ; 100; v2 ¼ 0:01; v4 ¼ 0:02.
Fig. 1. Flow chart of DGMM.
generated components will notify the Kalman filter to initialize their state, whereas vanished components will notify it to delete them.
Tracking results are shown in Table 1. The average Mean square error (MSE) is used to measure the differences between the tracking results and the known true parameters, i.e., the accuracy of tracking. It is computed for the weight, center, and covariance. The computational load is measured by the average number of iterations for all frames. We can see from the table that the Kalman filtering based DGMM will save a lot of computation time while maintaining the tracking accuracy. Table 1 Comparison between with and without Kalman filtering Type
Average MSE Weight
Center
Covariance
0.0021
0.0032
0.0011
4
0.0023
0.0047
0.0010
8
5. Experimental results To evaluate the performance of our algorithm we show results on both simulated and real-world datasets. 5.1. Simulated examples In the simulated example, 100 frames of 2000 2D samples are generated with parameters:
With Kalman filtering Without Kalman filtering
Average number of iterations
The average MSEs of weight, center, and covariance measures the differences between the tracking results and known true parameters, i.e., the accuracy of tracking. The average number of iterations measures the computational load. Clearly, DGMM powered by Kalman filtering will save a lot of computation time while maintaining the tracking accuracy.
Fig. 2. Simulated results demonstrating ‘‘merge’’ and ‘‘split’’ operations of DGMM (· represents samples on objects, circles indicates tracking results)— (a) frame 31: object merging, (b) frame 36: object merged, (c) frame 41: object splitting and (d) frame 44: object split.
842
G. Xiong et al. / Pattern Recognition Letters 27 (2006) 838–842
Fig. 3. Real data of tracking urchin embryo sequence.
Next, some parameters are adjusted to show the ability of tracking merging and splitting objects, namely, l4 ¼ ½0:5 0:1 þ v4 t T . The tracking result is shown in Fig. 2. 5.2. Real data We apply our algorithm to track the early development (motility and cleavage) in purple urchin embryos (Fig. 3). They are mounted in filtered seawater and filmed on a microscope. The cells are elliptical and fit our model well. A simple global threshold method is used for every frame to extract sample points to represent the objects. The operation is computationally very cheap and the thresholding level can be chosen from a wide range without altering the outcome of DGMM. DGMM is employed to find each cell’s current location, shape and orientation. Below is frame 65 and 81 from which we can see Cell 2 on frame 65 splits to produce Cell 2-a and 2-b on frame 81. Cell 3 on frame 65 splits to produce Cell 3-a and 3-b on frame 81. Cell 1 on frame 81 is splitting. All these experiments show that the DGMM algorithm we propose is effective for tracking the elliptical living object. The frame rate is about 15 and 8–10 without and with merging and splitting on ordinary PC when data size is about 10,000 in each frame. 6. Conclusion In this letter, we propose a novel dynamical Gaussian mixture model which is able to track elliptical living objects as well as their splitting and merging behaviors. Kalman filtering is included to estimate the parameters for the next frame which can decrease the number of iterations of the basic EM algorithm significantly. In addition, ‘‘split’’, ‘‘merge’’ and ‘‘delete’’ operations are integrated into
DGMM to account for object splitting, merging and vanishing and to suppress the chance of getting stuck at a local maximum, led by a mismatch between the number of objects in the model and the actual number of objects. Both simulated and real-world experiments show its effectiveness. We will apply DGMM to 3D case and analyze the theoretical aspect of our model in the future work. Acknowledgements The authors wish to thank Dr. George Dassow for providing us the video microscopic sequences and fruitful discussions. We also thank the two anonymous reviewers for their helpful comments. This work is supported by Chinese Ministry of Science and Technology under contract 001CB510307. References Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum-likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39, 1–38. Figueiredo, M.A.T., Jain, A.K., 2002. Unsupervised learning of finite mixture model. IEEE Trans. Pattern Anal. Machine Intell. 24 (3), 381– 396. Kailath, T., 1981. Lectures on Wiener and Kalman filtering. Springer, New York. McKenna, S., Gong, S., Raja, Y., 1999. Tracking colour objects using adaptive mixture models. Image Vision Comput. 17, 225–231. Oliver, N., Pentland, A., Be´rard, F., 1997. LAFTER: lips and face real time tracker with facial expression recognition. In: CVPR ’97. Rares, A., Reinders, M.J.T., 2000. Object tracking by adaptive modeling. In: ICIP ’2000. Ueda, N., Nakano, R., Gharhamani, Z., Hinton, G., 2000. SMEM algorithm for mixture models. Neural Comput. 12, 2109–2128. Vermaak, J., Doucet A., Pe´rez, P., 2003. Maintaining multi-modality though mixture modeling. In: ICCV ’2003. Zhang, B.B., Zhang, C.S., Yi, X., 2004. Competitive EM algorithm for finite mixture models. Pattern Recognit. 37, 131–144.