Int. J. Electron. Commun. (AEÜ) 66 (2012) 235–238
Contents lists available at ScienceDirect
International Journal of Electronics and Communications (AEÜ) journal homepage: www.elsevier.de/aeue
A novel approach to motion segmentation based on gamma distribution Cheolkon Jung ∗ , L.C. Jiao, Maoguo Gong Key Lab of Intelligent Perception and Image Understanding of Ministry of Education of China, Xidian University, Xi’an 710071, China
a r t i c l e
i n f o
Article history: Received 8 February 2011 Accepted 12 July 2011 Keywords: Gamma distribution Motion segmentation Motion estimation Optical flow
a b s t r a c t We provide a new motion segmentation method in image sequences based on gamma distribution. Motion segmentation is very important because it can be employed for video surveillance, object tracking, and action recognition. The Gaussian mixture model (GMM) has been widely used as a distribution model for motion segmentation. However, we found that the gamma distribution model is more suitable than the GMM for the optical flow based motion segmentation. Experimental results show that the proposed method is very effective in producing accurate motion segmentation results in image sequences. © 2011 Elsevier GmbH. All rights reserved.
1. Introduction The development of a powerful moving object segmentation method is one of the important requirements for many computer vision systems [1–5]. In video surveillance applications, motion segmentation has been used to determine the presence of peoples, cars, and other unexpected objects, and then start up more complex activity recognition steps. Moreover, the segmentation of moving objects in the observed scenes is an important problem to solve for traffic flow measurements or behavior detection during sports activities. Up to the present, many significant achievements have been made by researchers in the fields of moving object segmentation [5–10]. Chang et al. [6] proposed a Bayesian framework to combine motion estimation and segmentation based on the motion field. The motion field was represented as the sum of parametric and residual fields. Accordingly, velocity vectors were generated by affine model, and then motion and segmentation fields were obtained by energy minimization. For the energy minimization, the iterated conditional mode (ICM) was used. Luthon et al. [7] proposed a motion segmentation method using Markov random field (MRF) model. This method achieved good segmentation results by using spatio-temporal cliques in the MRF. Aach and Kaup [8] proposed a motion segmentation method of video objects based on a statistical approach. This method assumed that pixel difference between two consecutive frames was described by a Gaussian distribution. For a given level of significance, the resulting threshold value was theoretically obtained, and thus a change detection mask (CDM) for motion segmentation was created. In general, these methods have used the GMM for motion segmentation. However,
we found that the GMM does not always produce satisfying results in motion segmentation using optical flow by experiments. For example, Fig. 1 shows the actual distribution of optical flow vector’s magnitude in the Table tennis sequence. Here, z means the normalized magnitude of optical flow vectors while h(z) is the distribution of z. The blue dotted line represents the fitted GMM from original data. As can be observed, the GMM of the blue dotted line does not accurately follow the distribution of optical flow vector’s magnitude. Accordingly, this leads to unacceptable motion segmentation results. Therefore, more suitable model such as the red line of Fig. 1 is required for accurate motion segmentation. In this letter, we propose a new motion segmentation method which accurately segments motion of moving objects by using gamma distribution. This letter is organized as follows. In Section 2, we describe the proposed motion segmentation method based on gamma distribution in detail. In Section 3, some experimental results and the corresponding analysis are provided. Finally, we make a conclusion in Section 4.
2. Methods Optical flow is defined by the velocity field in image plane due to motion of objects in image sequences. Let I be the intensity of a pixel (x, y) of an image in time t. In traditional optical flow analysis techniques, the optical flow constraint equation is expressed as [11]: Ix u + Iy v + It = 0
∗ Corresponding author. E-mail addresses:
[email protected],
[email protected] (C. Jung). 1434-8411/$ – see front matter © 2011 Elsevier GmbH. All rights reserved. doi:10.1016/j.aeue.2011.07.006
(1)
where u and v are two components of velocity vector; and Ix , Iy , It are partial derivatives with respect to x, y, t, respectively. By Horn
236
C. Jung et al. / Int. J. Electron. Commun. (AEÜ) 66 (2012) 235–238
we make use of the gamma distribution for motion segmentation. The distribution of z, i.e. h(z), takes the following form [10,12]: h(z) =
M
ık
k=1
kk
(k − 1)!
z k−1 e−k z
(6)
where M means the maximum value of the PDF’s order; ık is a coefficient of each PDF; and k is decaying parameter of gamma function. Here, ık is 1 in k=1, 5, 9, . . ., otherwise 0. To assign a label to each pixel, the optimal number of PDFs, i.e. M, should be determined in advance. Since M is the number of clusters, we use the cluster validity measure proposed by Ray and Turi [13]. The principle of this method is to minimize the within-cluster scatter and maximize the between-cluster separation. The cluster validity measure, validity, is obtained by:
Fig. 1. Actual distribution of the optical flow vector’s magnitude in the Table tennis sequence: the blue dotted line represents the fitted GMM from original data and the red line indicates the suitable model for accurate motion segmentation. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)
and Schunck’s method [11], the components of the velocity vector are computed by (2) and (3). i+1
u
Ix (Ix ui + Iy vi + It )
i
=u −
vi+1 = vi −
+ Ix2 + Iy2 Iy (Ix ui + Iy vi + It ) + Ix2 + Iy2
2
2
(ui+1 − ui ) + (vi+1 − vi ) <
u(x, y)2 + v(x, y)2
|z − ml |
(7)
min(|ml − mm |)
where w is the weight; N is the total number of pixel; Cl is each cluster (l = 0, 1, . . ., K − 1); and ml means a lth mean. Then, we determine the optimal threshold value Tn to get a label of each pixel. Tn is determined by analyzing the distribution of (6). Accordingly, it is the z-value at the intersection point of two PDFs. If we assume that 0 = . . . = K−1 = , Tn is computed by:
(3)
√ √ 4 4 Therefore, T1 = (1/) 24 and T2 = (1/) 1680. We can assign a label to each pixel using (8). The label field L(x, y) is expressed as follows:
(4)
If a random variable z(x, y) is the magnitude of the velocity vector of a pixel (x, y) in an image, z(x, y) is defined as:
z ∈ Cl
l=1
(2)
(x,y)
z(x, y) =
K
1 Tn =
where is a weighting constant; and i is iteration number. If is a threshold value, the iteration is stopped when the following condition is satisfied:
validity = w
1 N
(5)
As mentioned earlier, z generally follows gamma distribution as shown in Fig. 2. In Fig. 2, k denotes each probability density function (PDF)’s order; mk means a kth mean; and Tk is a kth threshold. Thus,
4
4n! (4n − 4)!
L(x, y) = l,
(8)
z(x, y) ∈ Cl
(9)
However, although motion segmentation of each pixel is performed using the gamma distribution, noisy regions also occur. To remove the noisy regions, the MRF is used in the proposed method. The MRF allows motion segmentation to be performed by minimizing an energy function. If s is a center pixel and n is a neighborhood pixel at each point (x, y), maximum a posteriori (MAP) estimation of a label l is given by: ˆl = argmaxP(l|z)
(10)
l
where p(l|z) denotes the conditional probability of l given z. Here, total energy UT (x, y) in the neighborhood by the MRF can be expressed as: UT (x, y) =
1 2 2
[z(x, y) − ml ]2 + ˛
Vc (ls , ln )
(11)
˝
where 2 , ml , and ˛ are the observation variance, the mean of lth PDF, and a weighting constant chosen by experiments, respectively; and ls is a label of center pixel s, ln is a label of neighboring pixel n, and Vc (ls , ln ) is a potential function associated with a binary clique c = (s, n) [7,9]. Energy of each pixel is computed using (11), and thus the final motion segmentation results are obtained by energy optimization techniques. In the proposed method, the termination of iteration is automatically determined based on the energy difference between iterations t and t − 1 as follows [14]:
H W
(t) =
Fig. 2. Gamma distribution: k denotes each probability density function (PDF)’s order; mk means a kth mean; and Tk is a kth threshold.
x
y
(t)
(t−1)
|UT (x, y) − UT HW
(x, y)|
(12)
where H and W represent the height and width of the image (t) sequence, respectively; and UT (x, y) denotes the total energy at a iteration t. For each pixel s of the current image, the labels from 0 to K − 1 are tested and the label that induces the minimum local
C. Jung et al. / Int. J. Electron. Commun. (AEÜ) 66 (2012) 235–238
237
Fig. 3. Histogram of z. (a) Table tennis, (b) Claire, (c) Street, (d) Smoke, (e) Foreman, (f) Akiyo, (g) Diffusion and (h) Cloud. The red line represents the fitted gamma distribution of each sequence. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)
energy in the neighborhood is kept. The process iterates over the (0) image until (t) is lower than 0.05% of the first energy UT . Accordingly, the final label l* becomes one of 0, 1, . . ., K − 1 and arranged according to the magnitude of velocity vectors. Finally, the pixels which l* is greater than 0 are considered to have motion. 3. Results The experiments were performed by using Intel Core 2 Duo 3.0 GHz PC with VC++6.0. To verify the effectiveness of the proposed method, eight test image sequences of Table tennis, Claire, Street, Smoke, Foreman, Akiyo, Diffusion, and Cloud were used in the experiments. All sequences were adjusted to the size of 176 × 144 pixels. The optimal number of PDFs was selected by minimizing the validity in (7). The optimal number was 3 in the Table tennis sequence, and 2 in the other sequences. Fig. 3 shows the distribution of z in the test sequences. As can be seen, the Table tennis sequence has approximately 3 PDFs while the other sequences have 2 PDFs. Also, it can be observed that h(z) of the test sequences is close to gamma mixture distribution. Here, the red line represents
the fitted gamma distribution of each sequence. The results show that the gamma distribution is more suitable than the GMM for the optical flow based motion segmentation. Fig. 4 shows motion segmentation results for the Foreman sequence. It contains intermediate results including the optical flow vectors of (4), initial labeling result of (9), and MRF result of (12). It can be observed that noisy regions are effectively removed and moving objects are well segmented by the MRF procedure. To provide more reliable performance evaluation of the results, the Recall, Precision, and P1 values are measured over the test sequences. The three measures are defined as follows [15]: Recall = P1 =
Nc , Nc + Nm
Precision =
2 × Recall × Precision Recall + Precision
Nc , Nc + Nf (13)
where Nc represents the number of correctly segmented pixels; Nm is the number of missed segmented pixels; and Nf denotes the number of falsely segmented pixels. Table 1 lists performance evaluation results of the proposed method compared with the [7–9]
Fig. 4. Motion segmentation results for the Foreman sequence. (a) and (b) Two sequential frames. (c) Optical flow vectors. (d) Initial labeling result. (e) MRF result. (f) Final segmentation result.
238
C. Jung et al. / Int. J. Electron. Commun. (AEÜ) 66 (2012) 235–238
Table 1 Performance evaluation results from test sequences using the proposed and conventional methods.a Methods
Recall
Precision
Luthon et al. [7] Aach and Kaup [8] Jung and Kim [9] Our method
0.647 0.236 0.860 0.932
0.902 0.892 0.874 0.885
P1 0.775 0.564 0.867 0.909
a Recall, precision, and P1 are measured on Intel Core 2 Duo 3.0 GHz PC. The bold number represents the highest P1 value which means the best performance.
methods. In the table, the bold number represents the highest P1 value which means the best performance. As can be observed, the proposed method provides the best evaluation results with respect to the accuracy of motion segmentation. This is because the gamma distribution is more suitable than the GMM for motion segmentation using optical flow. Consequently, experimental results demonstrate that the proposed method achieves better performance than the other methods. 4. Conclusion In this letter, we proposed a new motion segmentation method based on gamma distribution. Most researchers have used Gaussian distribution for motion segmentation. However, we found that gamma distribution is more suitable for the optical flow based motion segmentation. Experimental results show that the proposed method exhibits better motion segmentation performance than the GMM based methods. We believe that the proposed method can be effectively applied to various vision applications including video surveillance, traffic flow measurement, behavior detection, and object based video coding. Acknowledgments The authors would like to thank the editor and anonymous reviewers for their valuable comments. This work was supported by the National Natural Science Foundation of China (Nos. 61050110144, 61072106, 60803097, 60972148, 60971128, 60970066, 61003198, and 61001206), the National Research
Foundation for the Doctoral Program of Higher Education of China (No. 200807010003), the National Science and Technology Ministry of China (Nos. 9140A07011810DZ0107 and 9140A07021010DZ0131), the Fund for Foreign Scholars in University Research and Teaching Programs (the 111 Project) (No. B07048), and the Fundamental Research Funds for the Central Universities (Nos. JY10000902001, K50510020001, and JY10000902045). References [1] Kwak S, Bae G, Byun H. Moving-object segmentation using a foreground history map. J Opt Soc Am A 2010;27(2):180–7. [2] Krutz A, Glantz A, Borgmann T, Frater M, Sikora T. Motion-based object segmentation using local background sprites. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (IEEE, 2009). 2009. p. 1221–4. [3] Spagnolo P, Orazio TD, Leo M, Distante A. Moving object segmentation by back-ground substraction and temporal analysis. Image Vis Comput 2006;24:411–23. [4] Kuo M, Hsieh CH, Huang YR. Automatic extraction of moving objects for headshoulder video sequences. J Vis Commun Image Rep 2005;16:68–92. [5] Jung C, Kim JK. Motion segmentation using Markov random field model for accurate moving object segmentation. In: Proceedings of ACM international conference on ubiquitous information management and communication (ACM, 2008). 2008. p. 414–8. [6] Chang MM, Tekalp AM, Sezan MI. Simultaneous motion estimation and segmentation. IEEE Trans Image Process 1997;6:1326–33. [7] Luthon F, Caplier A, Lievin M. Spatiotemporal MRF approach to video segmentation: application to motion detection and lip segmentation. Signal Process 1999;76:61–80. [8] Aach T, Kaup A. Bayesian algorithms for adaptive change detection in image sequences using Markov random fields. Signal Process Image Commun 1995;7:147–60. [9] Jung C, Kim JK. Moving object segmentation using Markov random field. J Korea Inform Commun Soc 2002;27:221–30. [10] Jung C, Jiao L, Gong M. New optical flow approach for motion segmentation based on gamma distribution. In: Proceedings of international conference on multimedia modeling. 2010. p. 444–53. [11] Horn BKP, Schunck BG. Determining optical flow. Artif Intell 1981;17:185–203. [12] Barkat M. Signal detection & estimation. Artech House; 1991. [13] Ray S, Turi RH. Determination of number clusters in K-means clustering and application in colour image segmentation. In: Proceedings of international conference on advanced pattern recognition and digital techniques. 1999. p. 137–43. [14] Jung C, Jiao L. Novel Bayesian deringing method in image interpolation and compression using a SGLI prior. Opt Express 2010;18(7):7138–49. [15] Jeong J. Play segmentation for the play-break based sports video using a local adaptive model. Multimed Tools Appl 2008;39:149–67.