Available online at www.sciencedirect.com
Procedia Earth and Planetary Science 2 (2011) 358 – 363
The Second International Conference on Mining Engineering and Metallurgical Technology
A MRF-based clustering algorithm for remote sensing images by using the latent Dirichlet allocation model Hong Tanga,b*, Li Shenb, Xin Yangc, Yinfeng Qia,b, Weiguo Jianga,b, Adu Gonga,b a
Key Laboratory of Environmental Change and Natural Disaster, Beijing Normal University, No. 19, Xinjiekouwai Street, Haidian District, Beijing 100875, China b State Key Laboratory of Earth Surface Processes and Resource Ecology, Beijing Normal University, No. 19, Xinjiekouwai Street, Haidian District, Beijing 100875, China c College of Global Change and Earth System Science, Beijing Normal University, No. 19, Xinjiekouwai Street, Haidian District, Beijing 100875, China
Abstract MRF (Markov Random Field)-based analysis of remotely sensed imagery provides valuable spatial and structural information that are complementary to pixel-based spectral information in image clustering. In this paper, we present a novel method for semantic clustering of remote sensing images by considering two level of spatial context information in two different ways. First of all, the proposed clustering approach uses a Modified Latent Dirichlet Allocation (MLDA) model to model an image collection, which is implicitly generated by partitioning a large satellite image into densely overlapped sub-images. Then, a folded Gibbs Sampler is employed to estimation the model parameters. At last, image clustering is achieved via the energy minimization technique in the framework of the MRF. Experimental results over a high-resolution satellite image show that (1) unlike traditional pixel-based clustering method, the co-occurrence among pixels is embedded into the clustering algorithm due to the LDA; Consequently, the two geo-object, i.e., shadow and water, in an image could be well separated even although their gray histogram is seriously overlapped; (2) clustering results seems to be more object-oriented based the MRF model.
© 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of Society for Resources, Environment and Engineering Keywords: probabilistic topic model; latent Dirichlet allocation; Markov random field; Gibbs sampler; geographic object based image analysis.
* Corresponding author. Tel.: +86-10-58806401; fax: +86-10-58806401. E-mail address:
[email protected] .
1878–5220 © 2011 Published by Elsevier Ltd. doi:10.1016/j.proeps.2011.09.056
359
Hong Tang et al. / Procedia Earth and Planetary Science 2 (2011) 358 – 363
1. Introduction Image clustering is a common means to extract geographic thematic information from Remote Sensing (RS) images under the situation that no train sample is available. Statistical approaches, especially Finite Mixture (FM) models, are widely employed for clustering problem. Although these models might be used to well describe the statistic character of pixel attribute, they are spatially independent. In other word, no spatial context information is utilized. Markov Random Field (MRF) models provide a systematic way to embed a smoothing prior over cluster labels to improve the spatial consistence among labels [1]. However, most methods use MRF only as a general prior in an FM model-based approach to build the maximum a posteriori estimation (MAP). They suffer from the limitation of the FM model mentioned above [2]. In order to address this problem, we present a variant of Latent Dirichlet Allocation (LDA) [3], referred to as Modified LDA (MLDA) to cluster high-resolution RS images. Unlike other methods which apply LDA model into image analysis [4], the MLDA model doesn't need to generate documents in advance, thus it could greatly reduce time and space complexity. Furthermore, since documents are correlated in a way they share a same label for a given pixel, the spatial structure among visual words is also considered. In this paper, parameter estimation for MLDA model is achieved by a folded Gibbs sampler. The effect of the “folded” trick is to impose some smoothness on the estimated parameters. Once the model parameters are determined, the MRF model enforces spatial coherence among cluster labels of pixels further. We show that by incorporating both the MLDA model and the MRF model into a mathematically sound MLDA-MRF framework, two level of spatial context information are taken into account and thus a robust clustering approach can be obtained. The remainder of this paper is organized as follows. Section 2 introduces the MRF model. Section 3 describes the MLDA model and the Folded Gibbs Sampler. Experimental results and discussions are given in Section 4. Some conclusions are drawn in Section 5. 2. MRF theory r In a MRF-based image clustering method, the label random field Z is assumed to be a Markov random field, which satisfies r r {P (Z ) > 0, " Z ? Z }} and P (z s | z Øs )
P (z s | z N )
(1)
s
where Z is the state space of the label field; z Øs is denoted by all of labels on the sites with the exception of site s; z N is the labels on neighbors of site s. s
In practice, under the MAP-MRF framework, pixel-labeling problems like image clustering are typically represented in terms of energy minimization. One seeks the labeling that minimizes the energy uμ r Z = arg min( U data + U smoothness ) ur
(2)
Z
where U data and U smooth are called data energy term and smoothness energy term, respectively. The former measures how well a label z s fits the particular pixel (site) s and the latter enforces spatial coherence. 3. The proposed model In this paper, a variant of Latent Dirichlet Allocation (LDA) is employed to cluster RS images. In the model, each label coupled with its neighbors, i.e., a document in the LDA, are assumed to be drawn from
360
Hong Tang et al. / Procedia Earth and Planetary Science 2 (2011) 358 – 363
a multinomial distribution with a Dirichlet prior. Given a label, each pixel attribute is also drawn from a multinomial distribution with another Dirichlet prior. Unlike the LDA, documents are correlated since they might include a set of same pixels. To simplified expression, the model is called a Modified Latent Dirichlet Allocation (MLDA). 3.1. LDA The LDA [3] is a generative hierarchical probabilistic clustering model, which is originally developed to model a collection of text documents. In this model, each document is represented as a finite mixture over latent topics, also called hidden aspects. Each topic in turn is characterized by a distribution over words. As shown in Fig 1. (a), the LDA assumes the following generative process for i th document r r r = {W 1, L ,W M } : W i = {wi 1, L , wiN } in a corpus D= r
x Sample qm : Dir (ar ) . r
x Sample f : Dir ( b ) . x For each of the N words {wm 1, L , wmN } .
r
(a) Sample a topic z mn : multinomial(qm ) . (b) Sample a word wmn : p(wmn | z mn , f ) a multinomial probability conditioned on the topic z mn . Both variational inference and Gibbs sampling have been used to infer and estimate the parameter for the LDA. For the detail about the derivation, please refer to [5]. 3.2. MLDA model The LDA and its variants have been applied to RS image annotation [6] or clustering [4]. Unlike the application of the LDA in a collection of pictures, some prior knowledge is needed to partition a large RS image into a set of sub-images, i.e., blocks with fixed size or segments. To embed the spatial correlation between pixels, a common choice is to generate a set of sub-images with overlap to some degree [4, 6]. Then, all of sub-images, i.e., documents, are assumed to be independent during modeling. Consequently, a same pixel at the original RS images could be allocated to different topic labels when it is grouped into different sub-images. In order to solve the ambiguous, one needs using a rule to combine all the allocated labels into a unique one, e.g., reweigh majority voting [4, 6]. We approach the problem in a different way. Any pixel together with its neighbors is regarded as a document and it would be allocated to one unique label only once for all documents where it resides in. In other words, documents don’t need to be generated independently in advance and are correlated in a way they share a same label for a given pixel. Given a RS image W and a neighbor system N on the lattice S , a corpus is given by r r r D = {W s = {ws , wN }, s S } , where s th document W s consists of s th pixel (i.e., word) and neighbor pixels r wN . In the MLDA, the generative procedure for any z s is described as follows. s
s
r
x Sample f : Dir ( b ) . x For each of the M words {w1, L , wM } r (a) Sample qs : Dir (ar ) .
r r
(b) Sample a topic z s : multinomial(q | qs , zrN ) . s
(c) Sample a word ws : P (ws | z s , f ) a multinomial probability conditioned on the topic z s .
Hong Tang et al. / Procedia Earth and Planetary Science 2 (2011) 358 – 363
Different from the LDA, the MLDA sample the topic z s from a multinomial distribution given both r r the multinomial parameter for the document qs and any other topics z N in s th document. s
Į
Į
șm
șs
zmn wm
¶
N M
zs
zt
ws
wt Ns M
¶ K
£
K
£
Fig. 1. (a) The graph model of LDA ; (b) The graph model of MLDA
3.3. A folded Gibbs sampler In order to obtain smoother estimated parameters and improve sampling efficiency, we present a folded Gibbs sampler. The sampler is unique in that it set a fixed odd sampling block window in the center of each document. At each iteration, given the document, just pixels in the block are sampled. 4. Experimental results and discussion In this section, a panchromatic QUICKBIRD image, as shown in Fig 2. (a), is used to evaluate the effectiveness of the proposed methodology. The size of the image is 900×900 pixels with the resolution of 0.60 meter. The clustering performance is compared with K-means+MRF framework in terms of both qualitative and quantitative measurements.
Fig. 2.(a) QUICKBIRD image; (b) Ground-truth; (c) K-means+MRF result; (d) MLDA+MRF result
4.1. Parameter estimation for MLDA In our experiments, the optimal number of clusters is 7 according to a MDL criterion [7]. The size of r r the documents is set as 17×17 pixels. An asymmetric Dirichlet prior a and a symmetric Dirichlet prior b is adopted. After all the required input parameters are given, the resulting parameters, including the topic r mixture proportion qm and the mixture component f , are estimated via a folded Gibbs sampler.
361
362
Hong Tang et al. / Procedia Earth and Planetary Science 2 (2011) 358 – 363
4.2. Image clustering in the framework MLDA+MRF We define data energy term and smoothness energy term as U data
M
N
¦¦ p ( z | wmn , T m , I ) , ٛ
U smoothness
¦
(3)
u{ p , q} .(1 G ( z p zq ))
{ p , q}N
m 1n 1
With respect to smoothness energy term, the Potts model is used, where G (.) is the Kronecker delta function and u{ p , q} is the penalty against nonequal labels on two-site cliques. Image clustering is achieved via a graph cut energy minimization technique [1].
4.3. Results analysis and evaluation x Qualitative evaluation It can be seen in Fig 2. (b) - (d), both K-means+MRF and the MLDA+MRF clustering results seems to be object-oriented. Obviously, the MRF model provides an effective way to model some spatial information in neighborhoods of pixels. Furthermore, in K-means+MRF clustering result, almost most of shadows are incorrectly clustered into a same class as water .However; they are correctly separated into two different clusters by K-means+MRF. The reason for this difference is due to the high similarity between the gray value of both shadow and water. In K-means+MRF framework, the cluster partitioning is actually conducted through the gray segmentation, which could not distinguish the pixels without tone discrepancies. In contrast, a set of neighbor pixels are regarded as a document and are modeled by MLDA before the MRF smoothing. Therefore, two level of spatial context information is taken into account with different mechanism. x Quantitative Evaluation We adopt a kind of generalized overall entropy to evaluate the performance of the proposed clustering approach [8]. For a specific class c , it is defined as a linear combination of the class entropy Ec and the cluster entropy Ek (For the detail about the derivation of Ec and Ek , please refer to [8]). Egeneralized
(4)
E Ec (1 E ) Ek
where E [0,1] is a weight that balances the two measures and E is set as 0.5 in our experiments; the cluster k covers the most pixels in the class c among all the clusters. Generally speaking, a smaller generalized overall entropy value indicates a higher homogeneity. The generalized overall entropies for different classes in different clustering results are shown in Fig 3. Generalized overall Entropy for each class
Gneralized overall Entropy
0/'$05) .PHDQV05)
:DWHU %XLOGLQJ )LHOG 5RDG
6KDGRZ
7UHH
Fig. 3. The generalized overall entropy for each class in two clustering results
Hong Tang et al. / Procedia Earth and Planetary Science 2 (2011) 358 – 363
As shown in Fig. 3., the generalized overall entropies for shadow, field and tree in the MLDA+MRF clustering results are all far less than these in K-means+MRF. In other words, the MLDA+MRF framework can distinguish these three classes more accurately. Then, the generalized overall entropy of water in the MLDA+MRF clustering result is near the best clustering result which can also be seen as a good result. According to the ground-truth data and the comparison of generalized overall entropies, the MLDA+MRF approach can get a similar clustering result with the K-means+MRF. Generally speaking, the generalized overall accuracy of the MLDA+MRF result is good and it can distinguish some classes better than the K-means+MRF. In particular, the separation of the “shadow” and “water” classes is achieved. 5. Conclusion A semantic clustering method of remotely sensed imagery is developed by using a MLDA model in the framework of MRF. The MLDA model combines the semantic information and the spatial information through modeling the conditional probability relation among “word”, “document” and “topic” through graph model. The MRF model enforces spatial coherence among cluster labels of pixels further. Therefore, two level of spatial context information in two different ways are considered. Our results show that MLDA+MRF approach compares favorably with that of the K-means+MRF method, particularly in the aspect of distinguishing the “shadow” and “water”.
Acknowledgements This work is in part supported by the National Basic Research Program of China (No. 2011CB707102), National Natural Science of China (No. 40901217), Doctoral Program Foundation of Institutions of Higher Education of China (No. 20090003120017) and the Fundamental Research Funds for the Central Universities (No. 105563GK)..
References [1] Szeliski, R., et al., A comparative study of energy minimization methods for Markov random fields with smoothness-based priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008. 30(6): p. 1068-1080. [2] Zhang, Y., et al., Segmentation of brain MR images through a hidden Markov random field model and the expectationmaximization algorithm. IEEE Transactions on Medical Imaging, 2001. 20(1): p. 45-57. [3] Blei, D.M., A.Y. NG and M.I. Jordan, Latent Dirichlet Allocation. Journal of Machine Learning Research, 2003. 3: p. 9931022. [4] Yi, W., Y. Chen and H. Tang, An object-oriented semantic clustering algorithm for high resolution remote sensing images using the aspect model. IEEE Geoscience and Remote Sensing Letter, 2011. 26(5): p. 40-44. [5] Heinrich, G., Parameter estimation for text analysis. Technical Report, 2005. [6] Lienou, M., H. Maitre and M. Datcu, Semantic annotation of satellite images using latent Dirichlet allocation. IEEE Geoscience and Remote Sensing Letters, 2010. 7(1): p. 28-32. [7] Kyrgyzov, I.O., et al. Kernel MDL to Determine the Number of Clusters. MLDM '07. Berlin: Springer-Verlag. 2007 [8] H. G. Akcay, S.A., Automatic detection of geospatial objects using multiple hierarchical segmentations. IEEE Transactions on Geoscience and Remote Sensing, 2008. 46(7): p. 2097-2111.
363