Available online at www.sciencedirect.com
Pattern Recognition Letters 29 (2008) 905–917 www.elsevier.com/locate/patrec
Unsupervised texture segmentation/classification using 2-D autoregressive modeling and the stochastic expectation-maximization algorithm Claude Cariou *, Kacem Chehdi Universite´ de Rennes 1, Ecole Nationale Supe´rieure des Sciences Applique´es et de Technologie, Laboratoire de Traitement des Signaux et Images Multi-composantes et Multi-modales, TSI2M BP 80518, 6, rue de Kerampont 22305 Lannion Cedex, France Received 17 October 2006; received in revised form 21 December 2007 Available online 26 January 2008 Communicated by Y.J. Zhang
Abstract The problem of textured image segmentation upon an unsupervised scheme is addressed. In the past two decades, there has been much interest in segmenting images involving complex random or structural texture patterns. However, most unsupervised segmentation techniques generally suffer from the lack of information about the correct number of texture classes. Therefore, this number is often assumed known or given a priori. On the basis of the stochastic expectation-maximization (SEM) algorithm, we try to perform a reliable segmentation without such prior information, starting from an upper bound of the number of texture classes. At a low resolution level, the image model assumes an autoregressive (AR) structure for the class-conditional random field. The SEM procedure is then applied to the set of AR features, yielding an estimate of the true number of texture classes, as well as estimates of the class-conditional AR parameters, and a coarse pre-segmentation. In a final stage, a regularization process is introduced for region formation by the way of a simple pairwise interaction model, and a finer segmentation is obtained through the maximization of posterior marginals. Some experimental results obtained by applying this method to synthetic textured and remote sensing images are presented. We also provide a comparison of our approach with some previously published methods using the same textured image database. Ó 2008 Elsevier B.V. All rights reserved. Keywords: Image segmentation; Classification; Texture; Stochastic modeling; Parameter estimation; Remote sensing
1. Introduction Image segmentation is one of the important tasks in computer vision, and many fields of application are concerned with it, including robotics, remote sensing, medical imaging, etc. The prime objective of segmentation is to produce a partition of an image into homogeneous regions that one hopes to correspond to objects or part of objects in the real scene under study. The quality of the segmentation stage is essential for further high-level processing such as classification, pattern recognition and representation, *
Corresponding author. Tel.: +33 2 96 46 90 39; fax: +33 2 96 46 90 75. E-mail address:
[email protected] (C. Cariou).
0167-8655/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2008.01.007
and coding; therefore, it has prominent effects on the global quality of all image interpretation system. Nevertheless, it is well known that image segmentation, as a particular inverse problem, is ill-posed in the sense of Hadamard (Hadamard, 1923, Poggio et al., 1985), especially when the complexity of the scene under study increases. In this paper, we focus our attention on the segmentation/classification of textured images. Many approaches to textured image segmentation, including feature-based, model-based, and structural methods, are available in the literature (Reed and du Buf, 1993; Petrou and Sevilla, 2006), and the research in this field is still very active (Arivazhagan and Ganesan, 2003; Li and Peng, 2004; Alata and Ramananjarasoa, 2005; Reyes-Aldasoro and Bhalerao,
906
C. Cariou, K. Chehdi / Pattern Recognition Letters 29 (2008) 905–917
2006; Montiel et al., 2005; Muneeswaran et al., 2005; Muneeswaran et al., 2006; Xia et al., 2006; Allili and Ziou, 2007; Xia et al., 2007). Herein, we shall restrict the notion of texture to correlated random fields, for which a stochastic model-based approach is frequently used. Our objective is therefore to separate an image into disjoint regions, within which the presence of statistically homogeneous random fields is assumed. These regions may possibly have the same average gray level and variance, but not the same generalized covariances. Formally, the problem of stochastic model-based image segmentation is the following: given a realization y of a random field (RF) Y defined on a 2-D lattice S 2 Z2 called the observation process and governed by an unknown set of parameters h, find the best (with respect to some criterion) realization x of a RF X called the region process that reflects the membership of each site (i.e. pixel location) to one of several possible texture classes. This doubly stochastic nature of an image has found a proper description framework through the use of, for example, Markov random field (MRF) modeling (Bouman and Liu, 1991; Krishnamachari and Chellappa, 1997; Melas and Wilson, 2002; Li and Peng, 2004). During the past two decades, there has been much interest in image segmentation using MRFs, leading either to supervised or unsupervised techniques. However, several methods (Besag, 1986; Lakshmanan and Derin, 1989; Comer and Delp, 1994) have assumed region conditional random fields made of independent and identically distributed (i.i.d.) random variables, which is obviously insufficient to account for the correlation between adjacent pixels that occurs in most practical situations. Since the beginning of the 90’s, much attention has been paid to the segmentation of correlated (colored) random fields, and many different approaches have been proposed. Among those are approaches involving a modeling of the texture process by a wide sense MRF (Zhang et al., 1994; Krishnamachari and Chellappa, 1997) or a subclass of it, say a causal autoregressive RF (Bouman and Liu, 1991; Chehdi et al., 1994; Comer and Delp, 1999; Li and Peng, 2004; Alata and Ramananjarasoa, 2005). However, one major drawback of much of these techniques is that they are not totally unsupervised, in the sense that they require the knowledge of the true number of texture classes. This general problem is referred to as the cluster validation problem in (Zhang and Modestino, 1990). In (Kervrann and Heitz, 1995), a dynamic evaluation of the number of regions or textures using an MRF modeling is presented. However, the length of the texture feature description vectors and the relative complexity of potential functionals seem to prevent its use to large sized images. These techniques belong to the class of Markov chain Monte Carlo (MCMC) methods, which intend to minimize a heavily non linear misclassification cost functional, typically approaching the maximum a posteriori (MAP) or the maximizer of posterior marginals (MPM) by means of Gibbs sampling and/or simulated annealing.
Reversible jump samplers (Richardson and Green, 1997) were also introduced for statistical problems such as the analysis of mixtures with an unknown number of components. This approach offers the possibility of dealing with weak prior information. While quite general, the hierarchical mixture model developed involves the use of a set of three hyperparameters controlling the multiple priors (the number of components, the prior probabilities and the parameters of conditional distributions), which requires much attention as for the strategy to adopt if all hyperparameters were able to vary simultaneously. Also, recently, a MCMC method has been introduced in (Alata and Ramananjarasoa, 2005) which is able to estimate, together with the conditional texture model orders and parameters, the number of texture classes with the use of a novel information criterion. A stochastic model-based technique is presented herein for the segmentation of textured images. Our method offers the advantage to estimate the number of texture classes as the prior information to be obtained before a further processing. Although the modeling of the texture is quite similar and the classification procedure also belongs to the MCMC class, this technique differs from the ones presented in (Alata and Ramananjarasoa, 2005; Wilson and Zerubia, 2001) for what concerns the estimation of the number of texture classes. The organization of the paper is the following. In Section 2, we give an overview of the proposed method. In Section 3 we describe the 2-D autoregressive modeling of the texture. In Section 4 we present the SEM algorithm and its application to the set of autoregressive features. Section 5 describes the final pixel-based segmentation procedure. In Section 6 we will present some experimental results obtained on synthetic and real images, and compare our results with a few other approaches using a same reference image database. We will conclude and present possible extensions of this work in Section 7. 2. Overview of the proposed method In the proposed method, an image is processed at two levels of resolution, following the method first developed by Manjunath and Chellappa (1991), Cohen and Fan (1992) and more recently by Alata and Ramananjarasoa (2005). At a first resolution level, the image is divided into regularly spaced small-sized windows from which can be computed (i) an estimate of the number of texture classes, (ii) features that will reveal the textural information and (iii) a coarse block-like segmentation. At a second resolution level, a pixel-based processing scheme is performed in a way to refine the segmentation in an iterative scheme using the information obtained at the end of the previous stage. The proposed texture segmentation method uses three stages. First a 2-D causal non-symmetric half-plane autoregressive (NSHP-AR) modeling of the textured image is performed within possibly overlapping windows covering the entire image. The second stage uses the basic version of
C. Cariou, K. Chehdi / Pattern Recognition Letters 29 (2008) 905–917
the stochastic expectation-maximisation algorithm (SEM) (Celeux and Diebolt, 1987), the purpose of which is firstly (classically) to estimate the parameters of identifiable mixed distributions, i.e. the prior probabilities and the parameters of the conditional distributions of texture classes, and secondly (less classically) to estimate the correct number of classes. This estimation scheme is performed on the basis of previously computed AR features, using the fact that their distribution is approximately multivariate Gaussian. This stage ends with the computation of a coarse, block-like image pre-segmentation. The third stage starts from both the original image and the previous analysis with the aim to refine the segmentation. Here we assume a hierarchical modeling of the image, i.e. a lowlevel region process and a high-level region conditional texture process. The latter will be modeled as a 2-D causal autoregressive process with initial parameters derived in the second stage, while for the region process, which is to be found, we will assume a simple Markov–Gibbs structure governed by a single parameter. Using this hierarchical model in a Bayesian framework, we then try to obtain a reliable segmentation by means of Gibbs sampling. 3. Computation of autoregressive features The objective of this stage is to produce a set of features which can reliably describe the textured nature of the image at a local scale. Various ways exist for texture feature extraction (Reed and du Buf, 1993; Ojala et al., 1996; Petrou and Sevilla, 2006), including non-parametric (e.g. Fourier transform, wavelet transform Arivazhagan and Ganesan, 2003), semi-parametric (e.g. co-occurrence matrix Haralick et al., 1973; Gabor wavelets Fogel and Sagi, 1989; Hofmann et al., 1996), and parametric (Gauss–Markov or autoregressive random field modeling) techniques (Zhang et al., 1994; Krishnamachari and Chellappa, 1997; Li and Peng, 2004; Alata and Ramananjarasoa, 2005; Deng and Clausi, 2005). The latter provide an interesting way due to the reduced number of parameters used to describe the texture, and also for its potential for texture synthesis. In this work we use a two-dimensional causal non-symmetric half-plane autoregressive (2-D NSHP-AR) modeling of the image data (see for example Kay, 1988, Chapter 15) mainly because of the linearity of the maximum likelihood (ML) parameter estimation. Thus the image stochastic model used to represent an image within an homogeneous textured region S is 8s 2 S: X Ys ¼ hr ðY sr lÞ þ Es þ l; ð1Þ r2R
where fhr ; r 2 Rg are the autoregressive coefficients, R is the 2-D NSHP support of linear prediction (see Fig. 1), l is the mean of the textured region, E ¼ fEs ; s 2 Sg is a random field called the prediction error, made of i.i.d. random variables with zero mean and variance r2 , and Y ¼ fY s ; s 2 Sg is the resulting 2-D autoregressive process.
907
Fig. 1. The 2-D non-symmetric half-plane AR model for texture analysis. The prediction support R of Eq. (1) is shown as the light-shaded region.
Finding ML estimates of AR parameters from an observation fY s ; s 2 S 0 Sg of the 2-D RF Y within a finite window is straightforward and in its simplest implementation requires to invert a sample covariance matrix. The reader is referred to Kay (1988) for a detailed analysis of various algorithms for finding optimal 2-D autoregressive parameters. The first stage of our method consists of computing the autoregressive parameters from regularly spaced (and possibly overlapping) windows S 0 ¼ fs ¼ ðm1 ; m2 Þ; 0 6 m1 6 M 1 1; 0 6 m2 6 M 2 1g S taken from the original image. Letting vs ¼ fy s l; s 2 S 0 g the observation process after the computation and the removal of the sample mean l, the autoregressive parameters (in the case of a four neighbors NSHP prediction support as in Fig. 1) are computed by 1
ð2Þ
A ¼ ½vs1 vsM ; 2 3 vs1 ð0;1Þ vsM ð0;1Þ 6 vs ð1;1Þ vs ð1;1Þ 7 6 1 7 M B¼6 7 4 vs1 ð1;0Þ vsM ð1;0Þ 5
ð3Þ
T
½hð0;1Þ hð1;1Þ hð1;0Þ hð1;1Þ ¼ ðBBT Þ BA; where T
vs1 ð1;1Þ
ð4Þ
vsM ð1;1Þ
and M ¼ M 1 M 2 . The sample variance of the prediction error is given by !2 X 1 X 2 vs hr vsr : ð5Þ r ¼ M s2S 0 r2R At the end of this stage is therefore available a set of features T vectors fhðjÞ ¼ ½fhr ðjÞ; r 2 Rg; lðjÞ; r2 ðjÞ ; 1 6 j 6 W g, where W is the number of windows. An important issue of such a modeling of texture lies in the fact that under
908
C. Cariou, K. Chehdi / Pattern Recognition Letters 29 (2008) 905–917
the hypothesis of a Gaussian random field Y, the AR parameter vector estimate hðjÞ asymptotically follows a multivariate Gaussian distribution (i.e. as the size of the window used for estimation goes to infinity) (Box and Jenkins, 1970). Of course, the Gaussian hypothesis is very strong in our context because to operate a local segmentation, one needs to compute the AR features within smallsized windows for which asymptotic properties do not hold. However, the Gaussianity of parameter vector estimates may be assumed to some extent in most practical situations as will be seen in Section 6. This is particularly true when the texture under study does not contain strongly marked periodicities, as pointed out in (Mendel, 1995). 4. Application of the SEM algorithm The stochastic estimation-maximization (SEM) algorithm (Celeux and Diebolt, 1987), just as the well known EM introduced in (Dempster et al., 1977), is aimed at providing the global maximum of a likelihood function of a parametric model in an iterative way. More precisely, considering the problem of maximum likelihood (ML) estimation of a parameter / given the observed data z, the ML ^ ML is defined as estimate / ^ ML ¼ arg max pðz; /Þ: /
ð6Þ
/
EM and SEM algorithms are interesting techniques when applied to problems of mixture identification or missing data. In mixture identification, we wish to find the ML estimate of / in form of the parameters of identifiable distributions, i.e. the prior probabilities and the parameters of each conditional distribution. Once given the number of components K, the EM algorithm generates a sequence ð/l Þ of parameter estimates. Each iteration proceeds in two steps: (i) E-step (Expectation) : estimate the posterior distribution, conditionally on Z i ¼ zi , that zi is issued from any of the K components, given the current estimate /l ; (ii) M-step (Maximization): solve for the likelihood equations, given the current posterior distribution, to obtain a new estimate /lþ1 . One important property of the EM algorithm lies in the monotonic increasing of the likelihood function at each iteration until a stable solution is found. Also under some regularity conditions, ð/l Þ has been proved to converge to a stationary point of the likelihood function. However this stationary point could either be a local maximum or a saddle point. Moreover, its convergence rate appears to be relatively slow in some situations, and the choice of the initial estimate is very crucial. The main idea of the SEM algorithm is to provide at each iteration a partitioning of the data by means of a random sampling based upon a previously computed posterior distribution. More precisely, a stochastic (S) step is introduced between the E-step and the M-step, that consists in the simulation of a pseudosample from the current posterior distribution; then the M-step updates /l using this pseudosample. In comparison with the EM algorithm, the SEM
algorithm is found to provide non-negligible advantages (Celeux and Diebolt, 1992): its convergence rate is faster, the stochastic step of the SEM avoids the procedure to be confined within a stationary point of the likelihood function, and the number of clusters needs not be known a priori, which is of a great use in the framework of image segmentation. This latter point is a consequence of the generation of intermediate pseudosamples: if a partitioning of the sample yields a class with a low population, then it is likely that this class should disappear. In spite of the intractability of theoretical convergence results in the general case of K > 2 identifiable mixed distributions (Diebolt and Celeux, 1993), the SEM algorithm has been proved to yield interesting results when applied for example to the unsupervised blind and contextual segmentation of satellite images (Masson and Pieczynski, 1993), to mixtures (Celeux et al., 1996) or to the restoration of sparse spike trains (Champagnat et al., 1996). However, in (Masson and Pieczynski, 1993), the textural information is taken into account via the observation of n-tuplets of neighboring pixels and their associated joint probability density functions, which is very costly from a computational point of view. In our work, the data entries z of the SEM procedure are not pixel values nor neighboring pixel values, but the AR parameter vectors fhðjÞ 1 6 j 6 W g obtained previously over small-sized windows. Note that this approach is similar to the one developed in (Manjunath and Chellappa, 1991) and then (Cohen and Fan, 1992), except that the clustering algorithm presented by the authors needs either the true number of clusters or some prior knowledge about it. In a way to apply the SEM algorithm for mixture identification, all we need to provide is an upper bound for the actual number of clusters. Convergence to the true number of clusters has been conjectured and proved in many experiments (see for example Celeux and Diebolt, 1992), provided that the identifiability of the mixture is insured. In the following, the unknown cluster label of vector hðjÞ will be referred to as CðjÞ. When applied to our set of data, the SEM algorithm is the following: Initialization: Define L the number of iterations; Set l ¼ 0; Define an upper bound K for the number of clusters; Give the same initial posterior probability pðlÞ ðCðjÞ ¼ k j hðjÞÞ ¼ 1=K for each vector hðjÞ to belong to each cluster k; b ¼ K. Set K Iterations: (S-step): – Draw a new cluster label for each vector hðjÞ according to the posterior distribution pðlÞ ðCðjÞ j hðjÞÞ; – Estimate the prior distribution of each cluster pðlÞ ðCÞ by empirical frequencies; – If pðlÞ ðC ¼ kÞ is below a threshold d for some k (see comments below)
C. Cariou, K. Chehdi / Pattern Recognition Letters 29 (2008) 905–917
* Suppress cluster k; * Randomly distribute the corresponding vectors into other clusters; * Let K b b 1. K (M-step): ðlÞ – Compute the sample mean vector lk and covariðlÞ ance matrix Rk of each cluster from the partition obtained in the S-step; (E-step): – For each vector hðjÞ, estimate the posterior probabilities as pðlþ1Þ ðCðjÞ ¼ k j hðjÞÞ
(Geman and Geman, 1984), the maximizer of posterior marginals (MPM) (Marroquin et al., 1987), the iterative conditional modes (ICM) (Besag, 1986), or contextual methods (Masson and Pieczynski, 1993). To insure a regularization of the segmentation result, we assume a simple Markovian structure for the underlying region process in form of a second-order pairwise interaction model, first proposed by Besag (1986) and used for instance in (Krishnamachari and Chellappa, 1997). Let X be the region process (texture labelling) defined on S and X s its restriction to site s 2 S. Our model assumes a local Gibbs distribution with the following form: pðX s ¼ k j X t ; t 6¼ sÞ ¼ pðX s ¼ k j X t ; t 2 Vs Þ
ðlÞ
pðlÞ ðC ¼ kÞfk ðhðjÞÞ ; ¼ Pb ðlÞ K ðlÞ i¼1 p ðC ¼ iÞfi ðhðjÞÞ
ð7Þ
¼
ðlÞ
where fk ðÞ denotes the distribution conditioned upon model k, i.e. in our case, assuming a multivariðlÞ ðlÞ ate Gaussian distribution with parameters ðlk ; Rk Þ: ðlÞ
ðlÞ
fk ðhðjÞÞ ¼j 2pRk j1=2 T 1 ðlÞ ðlÞ 1 ðlÞ : exp hðjÞ lk ðRk Þ ðhðjÞ lk Þ ; 2
ð8Þ Let l
l þ 1 and return to the S-step until l ¼ L 1.
Note that jointly and at each iteration l of the SEM procedure, it is possible to produce a pre-segmentation map in form of a ‘block-like’ labelled image. This pre-segmentation is obtained through the maximization of the posterior marginal distribution, i.e. one chooses for each hðjÞ the label k which verifies k ¼ arg max pðlÞ ðCðjÞ ¼ k j hðjÞÞ:
909
ð9Þ
k
The SEM procedure is iterated until convergence of parameter estimates that also characterizes a stable partitioning of the texture feature data into clusters. An important feature of this algorithm is its ability to remove under-represented clusters, i.e. partitions in which the population of individuals is under a defined threshold d. Practically, this threshold is chosen as some percentage of the number of vectors to process, though the theoretic minimum threshold is N þ 1 with N the dimension of the feature space. At the end of this stage, many information is available about the textured image: actually are obtained an estimate of the number of classes, an estimate of the texture class prior distribution, estimates of the texture parameters, and a coarse pre-segmentation map. 5. Final segmentation In a way to refine the segmentation from such available information, many approaches may be used, such as the maximum a posteriori (MAP) or an approximation to it
expðbus;k Þ ; Pb K expðbu Þ s;i i¼1
ð10Þ
where Vs is called the neighborhood of site s, i.e. a subset of S such as 8s 2 S; ðs 62 Vs Þ ^ ðs 2 Vt () t 2 Vs Þ, us;k represents the number of neighbors of site s having label k, and b is a parameter which governs the bonding of pixels with similar labels within the neighborhood Vs ; a high (positive) value of b increases the probability of the most frequently occurring label in Vs . One can notice that this model does not account for the zeroth order clique, giving equal importance to each label unconditionally to the considered site. The chosen classification/segmentation strategy is based on the MPM approach. The MPM estimate of the class label field can be shown to be the Bayes estimator for the mean error rate, and as such is of particular relevance in decision theory for classification purposes (Winkler, 1995). In order to approach the MPM solution to the segmentation problem, which requires the maximization of pðX s j Y; hÞ; 8s with respect to X s , we have used a Gibbs sampler with a basic annealing schedule. More precisely, we assume that the random field Y is a Gaussian autoregressive model conditioned upon the region process X, and that Y s depends on X s and fY sr ; r 2 Rg. Then, using Bayes rule, the marginal posterior distribution is given by: pðX s j fX t ; t 2 Vs g; Y; hÞ ¼
f ðY s j fY sr ; r 2 Rg; X s ; hÞ pðX s j X t ; t 2 Vs Þ : f ðY s j fY sr ; r 2 Rg; hÞ ð11Þ
Since the denominator in the RHS of Eq. (11) does not depend on X, it is not considered in the maximization. Then, it is known that for Markov random fields (which are a generalization of causal autoregressive random fields), we have f ðY s j fY r ; r 2 Rg; hÞ ¼ f ðEs j fY r ; r 2 Rg; hÞ, where the random variable Es corresponds to the prediction error given the neighboring pixels and a given set of autoregressive coefficients, and is assumed to follow a Gaussian distribution. Therefore, the marginal posterior probabilities are estimated by, 8k 2 f1; . . . ; Kg:
910
C. Cariou, K. Chehdi / Pattern Recognition Letters 29 (2008) 905–917
pðX s ¼ k j fX t ; t 2 Vs g; Y; hÞ 1 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi exp z 2pr2 ðkÞ
e2s;k 2 2rk
! þ bus;k ;
ð12Þ
P where es;k ¼ ðy s lk Þ r2R hr;k ðy sr lk Þ is the difference between the observation and its linear prediction over site s by the kth autoregressive model with parameter hk ¼ ½fhr;k ; r 2 Rg; lk ; r2k T , and z is a normalizing constant. Note that Eq. (12) is similar to Eq. (27) in (Krishnamachari and Chellappa, 1997) except that the region conditional texture process was chosen as a Gauss–Markov random field by the authors. Thanks to the knowledge of the local labelling process conditioned upon the estimate of the class-conditional parameters, we can derive the following algorithm: Step 0: Initialization: Starting from (i) the original image, b classes (ii) the coarse pre-segmentation in K obtained previously by the SEM procedure and (iii) a given ðb; cÞ, Step 1: Estimation of AR parameters: Compute the AR parameter estimates from the current segmentation b; for each texture class k, 1 6 k 6 K Step 2: Iterations: Repeat until convergence of the number b; of label changes: For s 2 S, k ¼ 1; . . . ; K Estimate the marginal conditional distribution using (12); Draw a new label for site s from (12); Step 3: Estimate the prior distribution of each label b by empirical frequencies; pðkÞ; k ¼ 1; . . . ; K If pðkÞ is below a threshold d0 for some k – Suppress class label k; – Randomly distribute a valid label to the corresponding sites among the remaining labels; b b 1; – Let K K Step 4: Let b b þ c; Step 5: Repeat from Step 1 until convergence of the segmentation, i.e. no significant site label changes. This algorithm is similar in spirit to the MPM algorithm due to the Gibbs sampling which is used to estimate the region process. However, differences exist between the MPM algorithm and our method concerning four points. Firstly, our strategy does not include a research of the most frequently occurred labels, and is based only upon the current estimate. This considerably reduces the computational load as well as the memory storage which are the main limitations of the MPM technique. Secondly, the outer loop of the algorithm allows an updating of the autoregressive parameter estimates computed from hopefully finer and finer segmentation results, so that these estimates should finally converge to their true values. Such an approach, called EM/MPM has been proposed in (Comer and Delp, 1994) and further studied in (Comer and Delp, 2000) from the convergence viewpoint.
However the algorithm presented herein is different from the EM/MPM since the updating of the autoregressive parameters does not effectively require estimates of the posterior marginals but is rather based upon the current realization of the random field X. The third difference concerns the use of a cooling scheme induced by increasing b during the processing, which is not a feature of the original MPM procedure. Actually, starting with a low b (i.e. a low regularization constraint) allows mainly to compensate for possible window misclassification derived from the SEM algorithm while using rather correct estimates of conditional model parameters. Also, the scheme for increasing the b parameter is equivalent to a decrease in temperature as encountered in simulated annealing methods (Geman and Geman, 1984). However, the search for optimizing this scheme will not be considered as an issue in the present work, and practically b will be chosen to increase linearly in the range [0.3, 2.5], depending on the size of the neighboring system used for the label process X. The last difference is the fact that a class with a very few number of individuals is allowed to disappear, in the same way as it is in the first stage of our approach. From a general – as well as practical – point of view, one can easily see that there is much resemblance between the algorithms proposed in the first and in the second stage of our approach. Indeed, both rely on the sampling of marginal posterior distributions, either from the S-step of the SEM procedure, and for the Gibbs sampling of the final segmentation procedure. 6. Experimental results In this section, we present some experimental results obtained with the proposed method. At first, we justify experimentally the approximation of autoregressive parameter distributions by multivariate Gaussian distributions. Then we present some segmentation/classification results obtained by processing synthetic texture mosaics taken from the Brodatz album (Brodatz, 1956; Xia et al., 2007) and also one aerial remote sensing image. 6.1. Distribution of autoregressive parameters We first tried to verify the Gaussian hypothesis of AR parameter estimates obtained from the analysis of a given texture. For this a v2 test statistic was computed in order to validate or reject the Gaussian hypothesis. More precisely, five 256 256 pixels texture images were analyzed (namely D84 (raffia); D9 (grass); D24 (calf leather); D68 (wood grain); D92 (pig skin)). Each image was divided into 1024 non-overlapping 8 8 pixels windows. Within each window, a NSHP-AR model with four coefficients was used. Adding to those coefficients the estimates of the corresponding image mean and prediction error variance gave a set of 1024 6-dimensional vectors corresponding to the
C. Cariou, K. Chehdi / Pattern Recognition Letters 29 (2008) 905–917
analyzed image. Table 1 gives the probabilities of accepting the Gaussian hypothesis for each parameter independently, and for each of the five textures. This v2 test was performed with eight degrees of freedom in all cases. One should point out that the Gaussian hypothesis cannot be rejected in most cases for the set of ffhr ; r 2 Rg; lg parameters, even for medium values of the v2 test. However, one can doubt the Gaussianity of the prediction error variance r2 which appears as improbable in four cases upon five. This is explained by the fact that the variance is estimated through the analysis of a small sample data set for which asymptotic properties do not hold. Fig. 2 shows the histograms of the six features obtained for two different textures, namely D68 (wood grain) and D24 (calf leather). For the first texture it is clearly seen that the Gaussian hypothesis cannot fit the data, while for the second one, it cannot be rejected, even for the sample variance estimate. 6.2. Segmentation/classification results 6.2.1. Application to synthetic images In order to assess the effectiveness of the proposed approach, we have first applied the method on a synthetic patchwork of the above textures. This 256 256 pixels image is shown in Fig. 3a. During the first stage, a texture feature extraction within non-overlapping 8 8 pixels windows was performed using a 2-D NSHP-AR model with four coefficients. In the second stage we ran the SEM algorithm with data consisting of the 1024 6-D vectors (four AR spatial parameters plus the mean and the variance), starting with an upper bound of K ¼ 12 texture classes. After L ¼ 1000 iterations of the SEM procedure, a stable b ¼ 6 texture partitioning of the data was obtained and K classes were found, instead of the true K ¼ 5. The pre-segmentation map in Fig. 3b is then obtained by choosing the label according to Eq. (9). It should be noted that the overestimation of K is mainly due to an inadequate modeling of the texture within windows that simultaneously overlap two different textures. This problem was already pointed out in (Cohen and Fan, 1992). However, over 93% correct classification rate is reached at the end of this stage. In the final stage, we start from the previous result in a way to refine the segmentation through the use of a second-order (eight neighbors) pairwise interaction region process governed by the single parameter b. Due to the high quality
Table 1 Probability of not rejecting the Gaussian hypothesis for AR parameter, mean, and variance estimates for 1024 features vectors computed from several textures
D24 (leather) D68 (wood) D9 (grass) D92 (pigskin) D84 (raffia)
hð0;1Þ
hð1;1Þ
hð1;0Þ
hð1;1Þ
l
r2
0.2652 0 0 0.8714 0
0.1558 0.1185 0.0381 0.3164 0.3227
0.2461 0.1613 0.8762 0.0352 0
0.7549 0.7968 0.6566 0.1970 0.1275
0.0158 0 0.1921 0.8432 0.0525
0.9408 0 0 0 0
911
of the pre-segmentation by SEM, we assume the class-conditional parameters to be correctly estimated. This allows the final segmentation to be performed in 100 iterations by linearly increasing b from 0.5 to 2.0. By doing this, the first iterations give more importance to the data fitting conditioned by the estimated parameters, while the last ones give more importance to the regularization process. The final segmentation stage has correctly estimated the b ¼ 5. The result is shown in true number of classes, i.e. K Fig. 3c, giving an overall correct classification rate of 99%. Note that in all the experiments presented in this section, the class significance thresholds d and d0 used in the first and second stages of the method are the same and are chosen as 0:01. Another experiment was performed using a 256 256 textured mosaic taken from the image database of the French GdR-PRC ISIS research group of the CNRS (http://gdr-isis.org). In this image, shown in Fig. 4a, each region is made of a different homogeneous texture also taken from the Brodatz album. Due to the variety of the K ¼ 7 textures used and to the fact that some textured regions have rather small sizes, this kind of image is known for its resistance to segmentation. We performed the first and second stages exactly as mentioned above for the first experiment. This gave us the pre-segmentation b ¼ 6 texture classes. Then runshown in Fig. 4b, with K ning the final stage for 1000 iterations with a constant b ¼ 0:8 made one more class disappear, and the segmentation in Fig. 4c was obtained. Note that in this experiment, the classification objective failed as three of the textures (namely D29 (sand), D112 (plastic bubbles) and D9 (grass)) were detected under the same label. However, in terms of segmentation, the corresponding regions were disconnected, so that the use of a larger autoregressive model should finally be able to discriminate between those three textures when passed within the different regions. In fact, the choice of the size of the AR texture features vectors is a key problem. More precisely, increasing the order of the AR model while keeping the independency of the parameter estimates would result in two undesirable effects: firstly, the size of the analysis window in the first stage should be also increased in a way to give reliable estimates; secondly, this would make the SEM algorithm much more difficult to use in practice due to the amount of computation needed to reach a stable clustering of the data. Also, a short AR model is to be preferred, in spite of its relative inability to distinguish between different isotropic textures as seen above. Nevertheless the segmentation result in Fig. 4c shows that two different textures (namely D38 (water) and D68 (wood)) with highly pronounced and similar spatial frequencies and orientations can be resolved with a four neighbor AR model. Also, one can notice that suppressing the mean or the variance in the feature set can make the pre-segmentation insensitive both to illumination (i.e. gray level shift) or contrast (i.e. gray level stretching), which may be useful in some specific applications.
912
C. Cariou, K. Chehdi / Pattern Recognition Letters 29 (2008) 905–917 theta(0,1)
theta(1,1)
300
300
200
200
100
100
0 0.6
0.8
1 theta(1,0)
1.2
1.4
0 –0.4
300
300
200
200
100
100
0 –0.4
–0.2
0 0.2 sample mean
0.4
0.6
0 –0.6
300
300
200
200
100
100
0 140
160
180
200
220
–0.2
0 theta(1,1)
–0.4
0 –100
–0.2 0 sample variance
0
100
theta(0,1) 200
150
150
100
100
50
50 0
0.2
0.4 0.6 theta(1,0)
0.8
1
0 –0.4
200
200
150
150
100
100
50
50
0 –0.4
–0.2
0 0.2 sample mean
0.2
0.4
0.4
200
300
0.2
0.4
theta(1,1)
200
0
0.2
0.4
0.6
300
0 –0.4
–0.2
–0.2
0 theta(1,1)
0 0.2 sample variance
0.4
0.6
200 150
200
100 100 0 110
50 120
130
140
150
0 500
1000
1500
2000
2500
Fig. 2. Histograms of computed AR parameters for two different textures: (a) top: D68 (wood grain); (b) bottom: D24 (calf leather). Solid lines represent the Gaussian distributions obtained with sample mean and variance estimates and used to compute the v2 test.
6.2.2. Application to a large textured image database To validate and compare our approach with other segmentation methods on a large image set, we have applied it on the MIV image database described recently in (Xia et al., 2007). This database is generated from 12 textures chosen from the Brodatz album, and a reference 256 256 image onto which are patched, in turn, four out of the 12 textures. Therefore, 495 textured mosaics are generated by this
procedure. We have processed the whole image set under the following conditions: for the SEM procedure, K ¼ 5, L ¼ 200, d ¼ 0:02, and for the classification refinement b 2 ½0:25; 1:70, c ¼ 0:05 and d0 ¼ 0:01. The algorithm proposed in Section 5 was slightly modified for the stopping criterion: the b parameter was forced to follow, in 30 iterations, a sawtooth pattern from 0.25 to 1.70 during ten cycles, thus requiring 300 iterations of the refinement procedure.
C. Cariou, K. Chehdi / Pattern Recognition Letters 29 (2008) 905–917
913
Fig. 4. (a) Top left: original image; (b) top right: pre-segmentation by SEM; (c) bottom left: final segmentation; (d) bottom right: boundary detection.
Fig. 5. Reference image for texture segmentation assessment (from Xia et al., 2007).
Fig. 3. (a) Top: original image; (b) middle: pre-segmentation by SEM; (c) bottom: final segmentation.
Fig. 5 shows the reference mosaic image used to build the MIV textured image database. Fig. 6 gives some partial and final segmentation results on the four examples (MIV1-4) also shown in (Xia et al., 2007). Table 2 extends compared misclassification percentages published in the same paper
with the results of our approach. The comparison includes the Spatial Fuzzy Clustering (SFC) of Liew et al. (2003), the MRF-model-based method of Deng and Clausi (2005), and the Fuzzy Clustering of Spatial Patterns (FCSP) of which was shown in (Xia et al., 2007) to provide better results than the previous two methods. Out of the four image samples, only one gave a misclassification rate greater than the result given by the FCSP method. The other misclassification rates show a clear improvement in the classification objective by using our approach, with a mean error rate of 9.30% compared to 10.26% by the FCSP method computed over the whole image dataset. Fig. 7 depicts the frequency distribution of the correct classification rate. One can observe a concentration of the distribution above 90% of correct classification, with a maximum at 99.26%. Nevertheless, some rates are far below the mean, and such situations occurred when the SEM procedure has merged the autoregressive features into a single cluster.
914
C. Cariou, K. Chehdi / Pattern Recognition Letters 29 (2008) 905–917
Fig. 6. Texture segmentation results on the four test cases (MIV1–MIV4) available in (Xia et al., 2007). Left column: images to be segmented; middle column: pre-segmentation by the SEM algorithm; right column: final segmentation.
Table 2 Percentage of misclassification on image set MIV (from Xia et al., 2007) Image
MIV1 MIV2 MIV3 MIV4 Average
Texture components
D4-D9-D16-D19 D4-D16-D87-D110 D9-D53-D77-D84 D9-D35-D55-D103 495 Samples
Methods SFC Liew et al. (2003)
MRF Deng and Clausi (2005)
FCSP
Proposed
4.95 6.98 4.95 6.98 12.10
9.94 12.18 9.94 12.18 17.38
4.23 5.30 4.23 5.30 10.26
2.35 1.69 5.18 0.95 9.30
The question of how our algorithm has reached the correct class number K ¼ 4 has been also verified in this experiment. It appears that, after the SEM procedure, 33% of the
classification results have reached the correct class number K ¼ 4. After the refinement procedure, this rate has b ¼ 5 and only 4% increased to 48%, while 48% found K
C. Cariou, K. Chehdi / Pattern Recognition Letters 29 (2008) 905–917
CASI sensor (Compact Airborne Spectrographic Imager) available in our laboratory. This sensor is able to image a ground scene, in a so-called ‘spatial mode’, in up to 12 non-overlapping wavebands (at 1 m of ground resolution) ranging in the visible to near infrared spectrum (403– 947 nm). Fig. 9a shows a 256 256 CASI mono-band
45 40 35 30
Counts
915
25 20 15 10 5 0 50
60
70
80
90
100
Correct classification rate (%)
Fig. 7. Histogram of correct classification percentage on the MIV image database.
b ¼ 3. However, a close analysis of the classification found K results shows our approach was able to detect some defects in the texture homogeneity. For instance, several classification results independently and coherently exhibited one elongated region with a separate label on the very left part of the D53 (oriental straw cloth) texture patch as shown in Fig. 8. Actually, this result is due to the decrease of the local variance at the left border of the original texture image. 6.2.3. Application to a remote sensing image We have also tested our approach on a real world image, namely an aerial remote sensing image obtained with the
Fig. 8. Some examples of classifications of MIV textured mosaics giving b ¼ 5 classes. First column: images to be segmented; second column: final K segmentation. Texture D53 is the upper left on top mosaic and the lower left on the bottom mosaic.
Fig. 9. (a) Top: original image; (b) middle: pre-segmentation by SEM; (c) bottom: final segmentation.
916
C. Cariou, K. Chehdi / Pattern Recognition Letters 29 (2008) 905–917
image (at spectral band 587:5 5:1 nm) showing different oriented patterns and textures. We have applied the proposed method to this image, taking K ¼ 20 as the upper bound of the number of classes. The SEM procedure was run for 1000 iterations, producing a pre-segmentation result featuring 11 classes (see Fig. 9b). Then, the final segmentation has converged to nine classes which can be seen on Fig. 9c. The obtained result is satisfactory in the sense that most of the prominent textures were correctly identified, particularly the high frequency textures at the center of the image.
this framework, such as Gauss–Markov random fields (GMRFs) (Krishnamachari and Chellappa, 1997), more general ones such as non-causal autoregressive-moving average models (NC-ARMA), or rotation and scale invariant stochastic models. Acknowledgements This work was supported by the Regional Council of Brittany, France, and the European Regional Development Fund through the Interreg III-B (Atlantic Area) project PIMHAI No. 190.
7. Conclusion and perspectives References The unsupervised segmentation technique for textured images presented here works at two levels for statistically homogeneous textured regions retrieval. The first level attempts to extract texture features upon a window-based 2-D AR modeling of the random field. The texture features are made of the within-window 2-D AR parameters, plus the mean and the variance of the prediction error. These features are then taken as data entries of the SEM algorithm, which can produce estimates of conditional distributions and a coarse pre-segmentation map. As such, this procedure may be seen as a kind of pre-attentive vision procedure. The second level allows one to classify each pixel of the image on the basis of the previous parameter estimates and using a simple Gibbsian structure for the underlying unobserved region process. This method offers some advantages in comparison with others available in the literature. Firstly it allows the estimation of the correct number of texture classes due to the use of the SEM algorithm. Secondly, the textured nature of homogeneous regions is taken into account via a modeling which is simple in essence and in the few number of parameters that are needed to describe them. We would also point out the potential of using the autoregressive modeling for region conditional textures. More precisely, it is possible to use the prediction error as a boundary detector between regions. Since the prediction error is readily available once the class-conditional autoregressive parameters are estimated, it is quite simple to derive a boundary map based upon a thresholding scheme, for instance by computing for each site s the following quantity: T ðsÞ ¼
b K X k¼1
pðX s ¼ k j fX t ; t 2 Vs g; Y; hÞ
e2s;k 2r2k
ð13Þ
and compare it with a defined threshold s. For example, Fig. 4d shows the resulting detected boundary map obtained for s ¼ 3:0. We are currently working at the integration of such information in conjunction with the region-based segmentation in a way to refine the whole segmentation process. Finally, it is important to note that other stochastic models for region-conditional texture may be used within
Alata, O., Ramananjarasoa, C., 2005. Unsupervised textured image segmentation using 2-D quarter-plane autoregressive model with four prediction supports. Pattern Recognition Lett. 26, 1069–1081. Allili, M.S., Ziou, D., 2007. Globally adaptive region information for automatic color texture image segmentation. Pattern Recognition Lett. 28, 1946–1956. Arivazhagan, S., Ganesan, L., 2003. Texture classification using wavelet transform. Pattern Recognition Lett. 24, 1513–1521. Besag, J., 1986. On the statistical analysis of dirty pictures. J.R. Statist. Soc. B 48 (3), 259–302. Bouman, C., Liu, B., 1991. Multiple resolution segmentation of textured images. IEEE Trans. Pattern Anal. Machine Intell. 13 (2), 99–113. Box, G., Jenkins, G., 1970. Time Series Analysis: Forecasting and Control. Holden-Day, San Francisco. Brodatz, P., 1956. Textures: A Photographic Album for Artists and Designers. Dover, New-York. Celeux, G., Diebolt, J., 1987. A probabilistic teacher algorithm for iterative maximum likelihood estimation. In: Classification and Related Methods of Data Analysis. Amsterdam, Elsevier, NorthHolland, pp. 617–623. Celeux, G., Diebolt, J., 1992. A stochastic approximation type EM algorithm for the mixture problem. Stochastics Stochastics Rep. 41, 119–134. Celeux, G., Chauveau, D., Diebolt, J., 1996. On stochastic versions of the EM algorithm. J. Statist. Comput. Simulat. 55, 287–314. Champagnat, F., Goussard, Y., Idier, J., 1996. Unsupervised deconvolution of sparse spike trains using stochastic approximation. IEEE Trans. on Signal Process. 44 (12), 2988–2998. Chehdi, K., Cariou, C., Kermad, C., 1994. Image segmentation and texture classification using local thresholds and 2-D AR modeling. In: Proceedings EUSIPCO-94, Edinburgh, UK, September, pp. 30–33. Cohen, F.S., Fan, Z., 1992. Maximum likelihood unsupervised textured image segmentation. CVGIP: Graphical Models and Image Processing 54 (3), 239–251. Comer, M.L., Delp, E.J., 1994. Parameter estimation and segmentation of noisy or textured images using the EM algorithm and MPM estimation. In: Proceedings International Conference on Image Processing, Austin, Texas, vol. 3, pp. 650–654. Comer, M.L., Delp, E.J., 1999. Segmentation of textured images using a multiresolution Gaussian autoregressive model. IEEE Trans. on Image Process. 8 (3), 408–420. Comer, M.L., Delp, E.J., 2000. The EM/MPM algorithm for segmentation of textured images: Analysis and further experimental results. IEEE Trans. on Image Process. 9 (10), 1731–1744. Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39, 1–38. Deng, H., Clausi, D.A., 2005. Unsupervised segmentation of synthetic aperture radar ice imagery using a novel Markov random field model. IEEE Trans. Geosci. Remote Sensing 43 (3), 528–538.
C. Cariou, K. Chehdi / Pattern Recognition Letters 29 (2008) 905–917 Diebolt, J., Celeux, G., 1993. Asymptotic properties of a Stochastic EM algorithm for estimating mixture proportions. Stochastic Mod. 9, 599– 613. Fogel, I., Sagi, D., 1989. Gabor filters as texture discriminator. Biol. Cyber. 61, 102–113. Geman, S., Geman, D., 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Machine Intell. 6 (6), 721–741. Hadamard, J., 1923. Lectures on Cauchy’s Problem in Linear Partial Differential Equations. Yale University Press, New Haven, CT, USA. Haralick, R.M., Shanmugam, K., Dinstein, I., 1973. Textural features for image classification. IEEE Trans. Systems Man Cybernet. 3 (6), 610– 621. Hofmann, T., Puzicha, J., Buhmann, J.M., 1996. Unsupervised segmentation of textured images by pairwise data clustering. In: Proc. International Conference on Image Processing (ICIP’96), vol. 3. Lausanne, Switzerland. Kay, S.M., 1988. Modern Spectral Analysis – Theory and Application. Prentice-Hall, Englewood Cliffs, NJ. Kervrann, C., Heitz, F., 1995. A Markov random field model-based approach to unsupervised texture segmentation using local and global spatial statistics. IEEE Trans. Image Process. 4 (6), 856–862. Krishnamachari, S., Chellappa, R., 1997. Multiresolution Gauss–Markov random field models for texture segmentation. IEEE Trans. Image Process. 6 (2), 251–267. Lakshmanan, S., Derin, H., 1989. Simultaneous parameter estimation and segmentation of Gibbs random fields using simulated annealing. IEEE Trans. Pattern Anal. Machine Intell. 11 (8), 799–813. Li, F., Peng, J., 2004. Double random field models for remote sensing image segmentation. Pattern Recognition Lett. 25 (1), 129–139. Liew, A.W.-C., Leung, S.H., Lau, W.H., 2003. Segmentation of color lip images by spatial fuzzy clustering. IEEE Trans. Fuzzy Syst. 11 (4), 542–549. Manjunath, B.S., Chellappa, R., 1991. Unsupervised texture segmentation using Markov random field models. IEEE Trans. Pattern Anal. Machine Intell. 13 (5), 478–482. Marroquin, J., Mitter, S., Poggio, T., 1987. Probabilistic solution of illposed problems in computational vision. J. Amer. Statist. Assoc. 82 (397), 76–89. Masson, P., Pieczynski, W., 1993. SEM algorithm and unsupervised statistical segmentation of satellite images. IEEE Trans. Geosci. Remote Sensing 31 (3), 618–633. Melas, D.E., Wilson, S.P., 2002. Double Markov random fields and Bayesian image segmentation. IEEE Trans. Acoustics, Speech Signal Process. 50 (2), 357–365.
917
Mendel, J.M., 1995. Lessons in Estimation Theory for Signal Processing Communication and Control. Prentice-Hall, Englewood-Cliffs, NJ. Montiel, E., Aguado, A.S., Nixon, M.S., 2005. Texture classification via conditional histograms. Pattern Recognition Lett. 26, 1740–1751. Muneeswaran, K., Ganesan, L., Arumugam, S., Ruba Soundar, K., 2005. Texture classification with combined rotation and scale invariant wavelet features. Pattern Recognition 38, 1495–1506. Muneeswaran, K., Ganesan, L., Arumugam, S., Ruba Soundar, K., 2006. Texture image segmentation using combined features from spatial and spectral distribution. Pattern Recognition Lett. 27, 755–764. Ojala, T., Pietikinen, M., Harwood Pattern, D., 1996. A comparative study of texture measures with classification based on featured distributions. Pattern Recognition 29 (1), 51–59. Petrou, M., Sevilla, P.G., 2006. Image Processing: Dealing with Texture. Wiley. Poggio, T., Koch, C., Torre, V., 1985. Computational vision and regularization theory. Nature 317, 314–319. Reed, T.R., du Buf, H.J.M., 1993. A review of recent texture segmentation and feature extraction techniques. Graphical Models Image Process. 57 (3), 359–372. Reyes-Aldasoro, C.C., Bhalerao, A., 2006. The Bhattacharyya space for feature selection and its application to texture segmentation. Pattern Recognition 39, 812–826. Richardson, S., Green, P.J., 1997. On Bayesian analysis of mixtures with an unknown number of components. J. Roy. Statist. Soc. Ser. B 59, 731–792. Wilson, S.P., Zerubia, J., 2001. Segmentation of textured satellite and aerial images by Bayesian inference and Markov random fields, INRIA research report, RR-4336, December 2001. Winkler, G., 1995. Image analysis, random fields and dynamic Monte Carlo methods. In: Karatzas, I., Yor, M. (Eds.), Applications of Mathematics. Springer-Verlag. Xia, Y., Feng, D., Zhao, R., 2006. Morphology-based multifractal estimation for texture segmentation. IEEE Trans. Image Process. 15 (3), 614–623. Xia, Y., Feng, D., Wang, T., Zhao, R., Zhang, Y., 2007. Image segmentation by clustering of spatial patterns. Pattern Recognition Lett. 28 (12), 1548–1555. Zhang, J., Modestino, J.W., 1990. A model-fitting approach to cluster validation with application to stochastic model-based image segmentation. IEEE Trans. Pattern Anal. Machine Intell. 12 (10), 1009–1017. Zhang, J., Modestino, J.W., Langan, D.A., 1994. Maximum-likehood parameter estimation for unsupervised stochastic model-based image segmentation. IEEE Trans. Image Process. 3 (4), 404–420.