ARTICLE IN PRESS
Neurocomputing 71 (2008) 2180–2193 www.elsevier.com/locate/neucom
Improvements on ICA mixture models for image pre-processing and segmentation$ Patrı´ cia R. Oliveira, Roseli A.F. Romero University of Sao Paulo, Brazil Available online 14 February 2008
Abstract Today several different unsupervised classification algorithms are commonly used to cluster similar patterns in a data set based only on its statistical properties. Specially in image data applications, self-organizing methods for unsupervised classification have been successfully applied for clustering pixels or group of pixels in order to perform segmentation tasks. The first important contribution of this paper refers to the development of a self-organizing method for data classification, named Enhanced Independent Component Analysis Mixture Model (EICAMM), which was built by proposing some modifications in the Independent Component Analysis Mixture Model (ICAMM). Such improvements were proposed by considering some of the model limitations as well as by analyzing how it should be improved in order to become more efficient. Moreover, a pre-processing methodology was also proposed, which is based on combining the Sparse Code Shrinkage (SCS) for image denoising and the Sobel edge detector. In the experiments of this work, the EICAMM and other self-organizing models were applied for segmenting images in their original and pre-processed versions. A comparative analysis showed satisfactory and competitive image segmentation results obtained by the proposals presented herein. r 2008 Published by Elsevier B.V. Keywords: Clustering; Self-organization; Computer vision; Image segmentation; Image smoothing; Pixel classification; Artificial neural networks
1. Introduction In machine learning theory, the concept of self-organization may be commonly defined as the process by which a system is spontaneously organized, without an external control imposed by the environment. Therefore, self-organizing methods have been frequently used for unsupervised classification purposes, where the goal is to discover natural groupings in a collection of data. Neural networks with unsupervised learning scheme and the k-means clustering algorithm are some examples of self-organizing methods used for unsupervised classification tasks [1,27]. Another self-organizing approach is a recent technique for unsupervised classification which consists of Independent Component Analysis (ICA) mixture models [20,25], where classes are described by linear combinations of $ This work was supported by the Brazilian agencies Capes, CNPq, and Fapesp. Corresponding author. E-mail address:
[email protected] (P.R. Oliveira).
0925-2312/$ - see front matter r 2008 Published by Elsevier B.V. doi:10.1016/j.neucom.2007.10.016
independent sources having non-Gaussian densities. In contrast to the Gaussian mixture models [14], which exploit only the second-order statistics (mean and covariance) of the observed data to estimate the posterior densities, methods based on ICA have been widely applied in many researching fields due the fact that they can exploit higherorder statistics [17,11,3]. Lee and coworkers originally proposed the ICA Mixture Model (ICAMM) to overcome an ICA limitation, which is the assumption that those signal sources are independent [20]. In such approach, this assumption was relaxed by using the concept of mixture models. The ICAMM algorithm finds the independent components and the mixing matrix for each class as well as computes the class membership probability for each pattern in the data set. The learning rules for the ICAMM were derived from the use of a gradient ascent method to maximize the loglikelihood data function. This work introduces the Enhanced ICA Mixture Model (EICAMM), which implements some modifications in the original ICAMM algorithm learning rules. As in the
ARTICLE IN PRESS P.R. Oliveira, R.A.F. Romero / Neurocomputing 71 (2008) 2180–2193
original ICAMM, some modifications are based on an information-maximization approach to blind separation and blind deconvolution by maximizing the information transferred in a network of non-linear units [3]. The nonlinearity in the transfer function was able to pick up higherorder moments of the input distributions and to perform redundancy reduction between units in the output representation. Additional improvements in ICAMM were made by incorporating some features of non-linear optimization methods based on the second derivatives of objective functions. For that, the Levenberg–Marquardt method (see, for example, [22]), was incorporated to the learning algorithm to guarantee and to improve the convergence of the model. To prove the efficiency of the proposed modifications, a preliminary study presents the classification results obtained by the EICAMM, the original ICAMM, and the k-means method. From that, it is possible to see that the proposed modifications can significantly improve the ICAMM classification performance, by considering random generated data and the well-known Iris flower data set. As an image segmentation process can be seen as a direct application of unsupervised classification techniques, the EICAMM has been also tested for performing such task. The complexity of an image segmentation process can vary depending on the kind of application. However, in general, the goal is to divide an image into regions having similar properties. Some approaches that present features such as learning and generalization abilities, incorporation of previous knowledge, and handling of uncertainties, have been successfully applied in image segmentation tasks. Among the techniques included in this category, a great interest has been addressed to Bayesian methods [9], artificial neural network models [5,15], systems based on the fuzzy set theory [10,32], and techniques that use some statistical image properties [13]. The main concern of this work is to evaluate the performance of the EICAMM in comparison to other selforganizing methods and to propose improvements to get better segmentation results through incorporating an image pre-processing methodology also proposed herein. Therefore, an important focus is given to methods capable of handling important issues related to noise presence, image smoothing, and edge enhancement, in order to make the images more suited to the segmentation processes. This paper is organized as follows: In Section 2, considerations on the ICAMM method are pointed out and supported by relevant work found in literature. A special attention is given to some important limitations of the ICAMM which motivated the ideas implemented herein. Some fundamentals of ICA are exposed in Section 3. The original ICAMM is described in Section 4. In Section 5, the EICAMM is introduced and a preliminary comparative analysis for numerical data classification is also presented. The image pre-processing methodology proposed in this work is presented in Section 7. In Section 6,
2181
experimental results for image segmentation are discussed in a comparative study involving EICAMM and other selforganizing methods for clustering. Finally, conclusions and future work are presented in Section 9. 2. Related work Since the ICAMM learning algorithm is based on a gradient optimization technique, it is known that its performance can be affected by some limitations associated to this kind of approach. The gradient ascent (descent) algorithm has became well known in literature as a standard method for training neural networks [22]. The widespread use of such technique is mainly related to its most powerful property: it can be mathematically proven that this algorithm will always converge to a local optimum in the objective function, although an immense number of iterations are often necessary in learning tasks. Moreover, another important problem to be solved is that there is no guarantee that the method will not be stuck at a local optimum. Although some promising features of the ICAMM have been reported [20], in experiments with random generated data and Iris data set, a very slow convergence and poor classification results were observed [25]. In other two papers found in literature [12,28], the ICAMM was also applied without a great success. The ICAMM demonstrated to work well on 2D artificial data, but the advantages of using such model in image segmentation were not proven conclusively [12]. Some feature extraction techniques were considered as pre-processing steps for reducing the data dimensionality and for increasing the efficiency of ICAMM [28]. Although the mean overall classification accuracies obtained by such approach are higher than those obtained by the k-means method, several limitations and assumptions that compromise the use of the ICAMM in remote sensing classification were pointed out. 3. Independent Component Analysis (ICA) The noise-free ICA model can be defined as the following linear model: x ¼ As,
(1)
where the n observed random variables x1 ; x2 ; . . . ; xn are modelled as linear combinations of n random variables s1 ; s2 ; . . . ; sn , which are assumed to be statistically mutually independent. In this sense, ICA model is a generative model, that is, it describes how the observed data are generated by a process of mixing the components sj . In such model, the independent components sj cannot be directly observed and the mixing coefficients aij are also assumed to be unknown. Only the random variables xi are observed and both the components sj and the coefficients aij must be estimated using x.
ARTICLE IN PRESS P.R. Oliveira, R.A.F. Romero / Neurocomputing 71 (2008) 2180–2193
2182
To make sure that ICA model can be estimated, the following assumptions and restrictions have to be satisfied [17]:
For unimodal super-Gaussian sources, the following density model is adopted:
(1) The independent components sj are assumed statistically independent. (2) The independent components sj must have nonGaussian distributions. Kurtosis or the normalized cumulant is commonly used as measure of nonGaussianity in ICA estimation of original sources. Such measure for a random variable y is given by
pðuÞ ¼ Nð0; 1Þsech2 u,
kurtðyÞ ¼ Efy4 g 3.
(2)
A Gaussian variable has a zero kurtosis value. Random variables with positive kurtosis have a super-Gaussian distribution and those with negative kurtosis have a sub-Gaussian distribution, as illustrated in Fig. 1. (3) For simplicity, it is assumed that the unknown mixture matrix A is square. It simplifies the estimation process, since after estimating matrix A, it is possible to compute its inverse W, and obtain the independent components simply by
(6)
which leads to the following learning rule for strictly superGaussian sources: DW / ½I tanhðuÞuT uuT W.
(7)
Therefore, using Eqs. (5) and (7), one can obtain a generalized learning rule, using the switching criterion for distinguishing between sub-Gaussian sources and superGaussian sources by the sign before the hyperbolic tangent function, which is given as DW / ½I K tanhðuÞuT uuT W,
(8)
where K is an N-dimensional diagonal matrix composed of k’s calculated as ki ¼ signðkurtðui ÞÞ.
(9)
(3)
s ¼ Wx.
4. The ICA Mixture Model (ICAMM) There are several methods for adapting the mixing mixtures in the ICA model [11,6,17,19]. The extended infomax ICA learning rule is used to blindly separate unknown sources with sub-Gaussian and super-Gaussian distributions [19]. Such approach uses a strictly symmetric bimodal univariate distribution, obtained by a weighted sum of two Gaussian distributions, given as pðuÞ ¼
2 1 2ðNðm; s Þ
2
þ Nðm; s ÞÞ,
In the ICAMM, proposed by Lee and coworkers [20], it is assumed that the data X ¼ fx1 ; . . . ; xT g are drawn independently and generated by a mixture density model. The data likelihood is given by the joint density as follows: pðXjyÞ ¼
DW / ½I þ tanhðuÞuT uuT W.
(5)
(10)
The mixture density is pðxt jyÞ ¼
K X
pðxt jC k ; yk ÞpðC k Þ,
(11)
t¼1
where y ¼ ðy1 ; . . . ; yK Þ are the unknown parameters for each pðxjC k ; yk Þ. In this case, C k denotes the class k and it is assumed that the number of classes k is known in advance. In ICAMM, the data in each class are described by
0.8 Laplacian Gaussian Uniform
0.7
pðxt jyÞ.
t¼1
(4)
where u ¼ Wx ¼ WAs leads to the learning rule for strictly sub-Gaussian sources:
T Y
xt ¼ Ak sk þ bk ,
0.6
(12)
where Ak is a N M scalar matrix and bk is the bias vector for class k. The vector sk is called the source vector. In this case, the number of sources (M) is equal to the number of linear combinations (N). The task is to classify the unlabelled data points and to determine the parameters for each class (Ak ; bk ) as well as the probability for each data point. The iterative learning algorithm which performs gradient ascent on the total data likelihood is presented in the following steps:
0.5 0.4 0.3 0.2 0.1 0 −6
−4
−2
0
2
4
6
Fig. 1. Examples of super-Gaussian, Gaussian, and sub-Gaussian probability density functions. Dash-dotted line: Laplacian density. Dashed line: Gaussian density. Solid line: uniform density.
(1) Compute the data log-likelihood for each class: log pðxt jC k ; yk Þ ¼ log pðsk Þ logðj detðAk ÞjÞ. In ICAMM, sk ¼
A1 k ðxt
bk Þ.
(13)
ARTICLE IN PRESS P.R. Oliveira, R.A.F. Romero / Neurocomputing 71 (2008) 2180–2193
(2) Compute the probability for each class given the data vector xt : pðxt jC k ; yk ÞpðC k Þ . pðC k jxt ; yÞ ¼ P k pðxt jC k ; yk ÞpðC k Þ
(14)
(3) Adapt the matrices Ak and the bias terms bk by using the following updating rules: DAk / pðC k jxt ; yÞAk ½I K tanhðsk ÞsTk sk sTk , P xt pðC k jxt ; yÞ . bk ¼ Pt t pðC k jxt ; yÞ
(15) (16)
For the log-likelihood estimation in Eq. (13), the term log pðsk Þ is modelled by ! N X s2k;i ki log ðcosh sk;i Þ log pðsk Þ / . (17) 2 i¼1 The learning algorithm converges when the change in log-likelihood function between two successive iterations is below a pre-determined small constant. After the convergence, processing each instance with the learned parameters Ak and bk , carries the classification process on. The class-conditional probabilities pðC k jxt ; yÞ are computed for all classes and the corresponding instance label is assigned to the class with highest conditional probability. 5. Enhanced ICA mixture model (EICAMM) The EICAMM proposed in this paper was derived from some important modifications on ICAMM, considering both modelling and implementation aspects, which are discussed in this section. 5.1. Reformulating the class model Instead of considering the bias term added to the data after they are generated by an ICA model (Eq. (12)), in EICAMM, the bias vectors were considered to have been mixed to the signal sources. This modification originated another equation to describe the data in each class, given as xt ¼ Ak ðsk þ bk Þ.
(18)
The idea that motivated this new formulation is supported by the ‘infomax’ criterion, in which it is considered the assumption that there is no noise, or rather, there is no noise model in that system. Therefore, aiming to guarantee the absence of noise added to the sensor inputs x, such noises are considered as added to the sources s, previously to the mixing process formulated in Eq. (1). 5.2. Reformulating the bias learning rule For a more informative bias learning rule in the EICAMM, a new bias updating rule was derived on the basis of an information maximization approach [3] used to
2183
maximize the mutual information that the output Y of a neural network processor contains about its input X. According to this idea, when a single input x passes through a non-linear transformation function gðxÞ, it yields an output variable y, in a way that the mutual information between these two variables is maximized. Through that transformation, the high density parts of the probability density function of x are aligned to the high sloping parts of the gðxÞ function. Therefore, the learning rules for weights and biases in such neural network are formulated by using one of the non-linear transfer functions. In the EICAMM formulation, the learning rule for the bias terms is proposed by considering the hyperbolic tangent transfer function, given as Dbk / 2 tanhðsk Þ.
(19)
5.3. Orthogonolizing the basis matrices Since the derived formulation of ICAMM states that Wk ¼ A1 k , Eq. (8) can be written as DW / ½I K tanhðuÞuT uuT A1 .
(20)
Supposing that A is an orthogonal matrix then, by a property of orthogonality, one has that A1 ¼ AT . Therefore, as one more modification that was incorporated to EICAMM, the updating rule of Ak is now given as DAk / pðC k jxt ; yÞAk ðI K tanhðsk ÞsTk sk sTk ÞT ,
(21)
ATk xt
where sk ¼ bk , considering the model for the data in each class used in EICAMM and formulated in Eq. (18). Note that the transpose operator is used instead of the inverse matrix A1 k , which implies in a computational advantage of EICAMM in comparison to ICAMM. Consequently, in the EICAMM, the mixing matrices Ak should be orthogonalized in each iteration by using the following equation: Ak ¼ Ak ðATk Ak Þ1=2 .
(22)
5.4. Incorporating second derivative information When the second derivative of the objective function is relatively easy to compute, it is considered an excellent idea to incorporate this information for speeding up the convergence and for bounding the function optimum. Following this motivation, a modification in the updating rule for Ak that incorporates the second derivative of the log-likelihood function has been proposed here. In this section, it is presented how the Newton’s method and the Levenberg–Marquardt method have been incorporated to the EICAMM.
Newton’s method: For modelling the Newton’s method to the EICAMM, the second derivatives of the log-
ARTICLE IN PRESS P.R. Oliveira, R.A.F. Romero / Neurocomputing 71 (2008) 2180–2193
2184
likelihood, which are given by the Hessian matrix H, should be incorporated to the learning rule in Eq. (15). Consequently, the new updating rule is given as 1
DAk / pðC k jxt ; yÞH Ak ðI K
tanhðsk ÞsTk
sk sTk ÞT . (23)
The derivation of the matrix H for EICAMM is presented in Appendix A. Levenberg–Marquardt method: When incorporating the Levenberg–Marquardt method to the updating rule for Ak , another modification has also been made to guarantee that the Hessian matrix H is positive defined, since this is a necessary condition for the matrix H to be invertible [2]. Such modification is given as DAk / pðC k jxt ; yÞðH þ mIÞ1 Ak ðI K tanhðsk ÞsTk sk sTk ÞT ,
(24)
where m is a small constant in (0,1). In the experiments of this work, the EICAMM uses the Levenberg–Marquardt method in its learning rule for Ak , as in Eq. (24). 6. Unsupervised classification experiments Initially, the EICAMM performance is evaluated in unsupervised classification tasks, considering simulated data and the well-known Iris flower data set (see, for example, [14]). After this analysis, results and discussions involving image data applications will be presented in Section 8.
6.1. Simulated data classification In preliminary experiments, six simulated data sets have been generated, with dimensions N ¼ 2, N ¼ 3, and N ¼ 5, as well as K ¼ 2 Laplacian (super-Gaussian) classes. In order to evaluate the case where K ¼ 3, one class with uniform (sub-Gaussian) distribution has been added to the previously generated classes, as it can be seen in Fig. 2 for two dimensions. Since then, for these simulated data, the real labels of patterns in the sets have been available, these ones could be compared with those ones estimated by the self-organizing models considered in this work. This analysis has been carried out in order to obtain an estimation of the errors committed by each model. The 10-fold cross-validation method (see, for example, [31]) has been applied to estimate the errors obtained by the EICAMM, the ICAMM, and the k-means methods in simulated data classification. Such results are given in Table 1. When analyzing the performance of both ICA mixture models, it can be noted that the results obtained by the EICAMM have been significantly superior than those obtained by the ICAMM, which has not showed good results in any of the experiments on simulated data. Indeed, it was observed that such inefficiency could be related to the fact that ICAMM tends to associate all patterns in the sets to only one class; even though several schemes for defining learning parameters have been tested. When comparing the EICAMM to the k-means, the EICAMM has obtained better results in all cases where the number of classes has been equal to 3. However, the EICAMM performance has been inferior to the k-means
Fig. 2. Example of random generated data with three classes.
ARTICLE IN PRESS P.R. Oliveira, R.A.F. Romero / Neurocomputing 71 (2008) 2180–2193
2185
Table 1 Simulated data classification results Number of dimensions
Number of classes
EICAMM Estimated error (%) Mean standard deviation
ICAMM Estimated error (%) Mean standard deviation
k-means Estimated error (%) Mean standard deviation
2 2 3 3 5 5
2 3 2 3 2 3
8.8572.64 15.5273.68 7.9272.52 20.4076.56 8.7573.40 32.9573.07
48.2071.34 66.4771.84 48.0071.00 65.8071.89 46.6072.12 65.9971.04
4.7572.50 28.1072.21 5.7571.82 33.2679.21 2.5571.21 49.8072.31
Table 2 Iris data classification results EICAMM Estimated error (%) Mean standard deviation
ICAMM Estimated error (%) Mean standard deviation
k-means Estimated error (%) Mean standard deviation
30:42 3:58
61:99 10:90
36:66 7:97
one in the experiments with two classes. This fact does not necessarily mean (depending, for example, on the data and the application complexity) that the EICAMM cannot be considered as a good classifier, since the estimated error was less than 10% in those cases. Furthermore, considering the EICAMM performance separately, for two classes, the error has been less than that for three classes, as it is given in Table 1. Also, for two classes, there are well-known good classifiers found in literature. The most concerning problems, in terms of obtaining good classification performance, occur when the data set has more than two classes. Therefore, in the image experiments of this work, only results for a number of classes greater than 2 have been considered. 6.2. Iris data classification The Iris flower data set (see, for example, [14]) contains three classes with four numeric attributes of 50 instances each, where each class refers to a type of Iris flower. One class is linearly separable from the other two, but the other two are not linearly separable from each other. Classification results provided by the EICAMM, the ICAMM, and the k-means methods are given in Table 2. It can be noted that for the cases where the number of classes was K ¼ 3, the models have presented a performance similar to that one observed in the simulated data classification. Once again, the results obtained by the EICAMM have been significantly superior to the ones provided by the ICAMM, and also better than those achieved by the k-means. 7. The image pre-processing methodology As the problem of image segmentation can be directly addressed by self-organizing methods for clustering, the
EICAMM has been also tested for performing such task. However, among the main aspects that can prejudice the performance of an autonomous segmentation process, there is presence of some kinds of noise, added to the images during the acquisition or the transmission processes. Attempting to reduce such inconvenient effects, a number of linear [23] and non-linear [24,30] denoising techniques have been proposed. In recent years, a non-linear denoising method, named Sparse Code Shrinkage (SCS) [18], has been successfully used in many image processing applications [21,29]. The main advantage of this method, in comparison to the other non-linear denoising techniques, is the fact that the SCS is closely related to the ICA estimation method, which makes such technique well suited for detecting important structures in image data [4]. Sparse coding has been used for finding a representation of multidimensional data in which few components are simultaneously activated. Let z ¼ ðz1 ; z2 ; . . . ; zn ÞT denote the observed n-dimensional random vector in the original representation, and s ¼ ðs1 ; s2 ; . . . ; sn ÞT denote the vector of the transformed component variables. Considering A ¼ ða1 ; a2 ; . . . ; an ÞT is the transformation matrix whose rows are the basis vectors of the sparse components, the linear relationship between z and s is given as s ¼ Az.
(25)
A simple choice for the sparse code basis is to consider the data to be generated by an ICA model [18]. Therefore, first estimating the ICA transforming matrix and then orthogonalizing it can estimate the sparse code matrix A. The noise in the sparse components can be reduced by applying a non-linear shrinkage function to each sparse components. This operation eliminates small amplitude components, which are thought to be influenced by noise. In its original formulation, the transformation matrix needs to be learned from noise-free data. In this paper, it is assumed that all considered images present random Gaussian noises which could be generated, for example, during the acquisition or digitalization processes. Under this assumption, the shrinkage function, to be applied to each sparse component, may be defined as pffiffiffi f ðsÞ ¼ signðsÞ maxð0; jsj 2s2 Þ, (26)
ARTICLE IN PRESS P.R. Oliveira, R.A.F. Romero / Neurocomputing 71 (2008) 2180–2193
2186
where f ðÞ is a shrinkage function and s2 is the noise variance. If the noise variance is unknown, it should be estimated or treated as a user-defined parameter. Even when the images are not assumed to be noisy, it was noted that the SCS could be applied to smooth images, making the regions of interest more homogeneous. In this case, the larger the noise variance value used in shrinkage function, the stronger the smoothing effect in the resulting image. In the experiments of a previous work [26], the SCS was applied as a smoothing filter, removing small details from the image, as it occurs with other image denoising techniques. As a consequence, regions of interest that are relatively large can become more homogeneous, making the segmentation process easier. However, in spite of the benefits of any denoising techniques, when the method is applied to suppress noise in images, certain problems may arise. An example of a common problem is that the denoising algorithm can blur important edges, also affecting the segmentation result. Here, an image pre-processing methodology is proposed, aiming to smooth the image regions and to attenuate the noise presence problem, without compromising important elements, such as the contours of objects to be found in the segmentation process. More precisely, such scheme involves a combination of image processing techniques, which are primarily taken as two individual and parallel stages. In the first pre-processing step, the SCS is applied to the original image; simultaneously, the Sobel operator is also
c
a
Table 3 Steps for applying the proposed pre-processing methodology and the EICAMM for image segmentation Step
Description
Parameters
1
Apply the SCS to the original image Apply the Sobel operator to the original image Add the images obtained in steps 1 and 2 Run the EICAMM for the image obtained in step 3
s2 : noise variance
2 3 4
None a: learning parameter for the Ak terms b: learning parameter for the bk terms
Table 4 Details about images used in segmentation experiments Image
Number of lines
Number of columns
Source
I1
256
512
I2
512
768
I3
512
512
I4
512
640
http://www.cis.hut.fi/projects/ ica/data/image ftp://ipl.rpi.edu/pub/image/ still/Kodakimages/ http://vision.ece.ucsb.edu/ data_hiding/ETlena.html Department of Fitossanity— UNESP, Brazil
SCS image
d
Original image
b
None
Combined image (SCS+Sobel)
e
Segmented image
Sobel image
Fig. 3. The proposed methodology for image segmentation. (a) Original image. (b) Image pre-processed by the SCS. (c) Image pre-processed by the Sobel edge detector. The images (b) and (c) into the dashed box were processed in a parallel step. (d) Image obtained by adding the images (b) and (c). (e) Resulting segmented image.
ARTICLE IN PRESS P.R. Oliveira, R.A.F. Romero / Neurocomputing 71 (2008) 2180–2193
used to enhance image edges in the same original version. Finally, both results are combined, simply by adding the two images, in order to produce the resulting pre-processed image. The proposed methodology produces images that can be processed by any segmentation approach. A complete scheme on how the pre-processing and the segmentation tasks can be accomplished is illustrated by an example in Fig. 3. The steps that should be followed in order to perform image segmentation tasks through applying the image preprocessing methodology and the EICAMM are summarized in Table 3. It is important to note that the steps 1–3, which are associated to the pre-processing methodology, do not depend on the method chosen to be used in the image segmentation process. As one will see in the experiments of this work, these steps can also be executed for other
2187
clustering methods, such as the k-means and unsupervised neural networks. 8. Image segmentation experiments Four images from different domains have been used in the image segmentation experiments of this work. Details on the descriptions of such images are given in Table 4. The image denoted here by I1 is part of an image set of natural scenes; I2 is part of a demo image set of the Table 5 Convergence measurements for both EICAMM and ICAMM
EICAMM ICAMM
Execution time (s)
Number of iterations
7 40
302 5025
Fig. 4. Original images used in the segmentation experiments: (a) image I1; (b) image I2; (c) image I3; (d) image I4.
ARTICLE IN PRESS 2188
P.R. Oliveira, R.A.F. Romero / Neurocomputing 71 (2008) 2180–2193
PhotoCD Kodak system; I3 image is the well-known Lena image, widely used in image processing literature; and image named as I4 was given to be used in the experiments of this work by the Department of Fitossanity, on Statal University of Sa˜o Paulo (UNESP), Brazil. The original versions of these four images are presented in Fig. 4. To prepare the data to be processed, each image has been scanned and its corresponding input data set has been formed by input patterns, where each pattern consists of a vector composed of a center pixel and its eight-nearest neighbors in a square neighborhood. The images representing the segmentation result have been built by labelling center pixels with class labels found by the unsupervised classification method. (1) EICAMM results: In order to evaluate the unifications of the original contributions proposed in this paper, the EICAMM model has been applied for segmenting the original and pre-processed versions of the images presented in Fig. 4.
These experiments have been carried out according to the steps described in Table 3, where the parameter values have been set as s2 ¼ 0:3, a ¼ b ¼ 0:000001. The results provided by the EICAMM for a number K ¼ 3 of groups are presented in Figs. 8 and 9, for the original and preprocessed images, respectively. First, one can observe that the EICAMM can be considered as a good approach to cluster image data for detecting homogeneous image regions which represent important structures (objects). Moreover, by comparing the results obtained by the EICAMM for the original images with those obtained for the pre-processed versions, it could be noted that the pre-processing methodology proposed in this work actually leads to a real improvement in the segmentation results. However, the pre-processing methodology has not influenced the results obtained by the original ICAMM, which has continued to group most of the image pixels into only one class.
Fig. 5. Segmentation results obtained by EICAMM model for original images: (a) result for I1; (b) result for I2; (c) result for I3; (d) result for I4.
ARTICLE IN PRESS P.R. Oliveira, R.A.F. Romero / Neurocomputing 71 (2008) 2180–2193
In order to briefly analyze the convergence of both ICA mixture models, their execution times have been compared for performing the same task in a Pentium IV PC. For that experiment, the original version of image I1 has been given as input to the algorithms, which have stopped when the change in the log-likelihood function has been less than 0.01 or when the maximum number of iterations has been achieved. In this case, the maximum number has been set as 40 iterations. Such measurements, which are given in Table 5, reinforce the fact that incorporating second-derivative-based methods can guarantee and improve the convergence of models based on the gradient optimization technique. One can observe that, although the ICAMM have taken the maximum number of iterations, it has not obtained a regular image segmentation result. In contrast, the EICAMM has run the whole experiment by taking only seven iterations for presenting a satisfactory segmentation performance, as it can be seen in Fig. 5a.
2189
(2) Fuzzy ART neural network results: The Fuzzy ART neural network [8,16] is another example of self-organizing method that has been applied for finding clusters in a data set. This model refers to a neural network based on Adaptive Resonance Theory (ART) [7] and fundamentals of fuzzy set theory [1]. The segmentation results obtained by the Fuzzy ART neural network, for the original versions and the images pre-processed by the methodology proposed in this work are presented in Figs. 6 and 7, respectively. In these cases, the maximum number of clusters has been set to K ¼ 2. Here again, the clustering method has achieved better results when the images were previously pre-processed by the proposed methodology. However, the segmentation process implemented by this network detected, basically, only the contours of the objects in the images, not showing to be capable of defining some regions of interest. As the number of clusters K ¼ 2 seems to be small to detect basic structures in relatively complex images, like those used in
Fig. 6. Segmentation results obtained by EICAMM model for pre-processed images: (a) result for I1; (b) result for I2; (c) result for I3; (d) result for I4.
ARTICLE IN PRESS 2190
P.R. Oliveira, R.A.F. Romero / Neurocomputing 71 (2008) 2180–2193
Fig. 7. Segmentation results obtained by Fuzzy ART neural network for original images: (a) result for I1; (b) result for I2; (c) result for I3; (d) result for I4.
the experiments, additional tests have been realized considering a greater number of clusters. In spite of this effort, the groups detected by the neural model has not been greater than 2 and segmentation results better than those presented in Fig. 8 have not been achieved. (3) k-means results: Since the pre-processing methodology has had its efficiency proved on the Fuzzy ART and the EICAMM experiments, the pre-processed image versions have been used as input to the k-means algorithm for a number of classes K ¼ 3. Such results can be seen in Fig. 9. When analyzing the k-means performance in these experiments, it can be observed that, except for the I2 image, the clustering method could not obtain satisfactory segmentation results. Actually, the algorithm has detected image regions that are clearly homogeneous as distinct
areas in its segmented version. Such analysis leads to the conclusion that the k-means behavior in image data clustering has not been as reasonable as in numerical data set experiments. One can associate this k-means limitation to the fact that its algorithm is not able to detect important statistical structures in image data, which, in contrast, has been captured when the ICA-based method has been applied. 9. Conclusion and future work The EICAMM model, proposed in this work, was derived from some modifications in the ICAMM, aiming to improve its performance in unsupervised classification and in image segmentation tasks. Such improvements were primarily based on modelling and implementation aspects.
ARTICLE IN PRESS P.R. Oliveira, R.A.F. Romero / Neurocomputing 71 (2008) 2180–2193
2191
Fig. 8. Segmentation results obtained by Fuzzy ART neural network for pre-processed images: (a) result for I1; (b) result for I2; (c) result for I3; (d) result for I4.
One of the most important contributions is the incorporation of Levenberg–Marquardt optimization method to the learning algorithm in order to guarantee and improve its convergence. Furthermore, an image pre-processing methodology, which combines the SCS method and the Sobel edge detector, has also been proposed in this work. In image segmentation experiments it could be observed that, in fact, the proposed image pre-processing methodology leaded to a great improvement in the segmentation results obtained by both Fuzzy ART neural network and EICAMM models. Unifying the two main original contributions of this paper, the EICAMM method was successfully applied to segment the images pre-processed by the proposed methodology, providing results which were significantly
superior to those obtained by the Fuzzy ART network and by the k-means method. As a future work, a neural network which implements the EICAMM model is intended to be developed. Moreover, other non-linear activation functions can also be evaluated to be used in the learning algorithm, instead of the hyperbolic tangent that has been considered in this paper. Another interesting future contribution consists in applying specific clustering validation techniques to evaluate the EICAMM performance in experiments with simulated data and numerical data sets which could explore a greater variation on the number of classes and higher dimensions. Specific methods for evaluating image segmentation results (edge boundaries or regions) will also be investigated.
ARTICLE IN PRESS P.R. Oliveira, R.A.F. Romero / Neurocomputing 71 (2008) 2180–2193
2192
Fig. 9. Segmentation results obtained by k-means for pre-processed images: (a) result for I1; (b) result for I2; (c) result for I3; (d) result for I4.
Appendix A. Derivation of second derivative for the loglikelihood function T
T
DW / ½I K tanhðuÞu uu W.
Using the result ut ¼ Wxt ¼ WAst and substituting in this expression, we have I K tanhðuÞuT uuT K tanhðuÞuT K sech2 ðuÞuuT 3uuT
(27)
¼ I 2K tanhðuÞuT K sech2 ðuÞuuT 3uuT .
Computing the second derivative in relation W of Eq. (27) we have qðI K tanhðuÞuT uuT Þ W qW ¼ ðI K tanhðuÞuT uuT Þ þ K tanhðuÞsT AT quT qu T 2 T u W K sech ðuÞAsu u qW qW
ðI K tanhðuÞuT uuT Þ þ
¼ ðI K tanhðuÞuT uuT Þ K tanhðuÞuT K sech2 ðuÞuuT uAst W Ast uT W.
References [1] A. Baraldi, P. Blonda, A survey of fuzzy clustering algorithms for pattern recognition: II, IEEE Trans. Syst. Man Cybern. 29 (1999) 786–801. [2] M.S. Bazaraa, Non-linear Programming, Wiley, New York, 1979. [3] A.J. Bell, T.J. Sejnowski, An information-maximization approach to blind separation and blind deconvolution, Neural Comput. 7 (1995) 1129–1159. [4] A.J. Bell, T.J. Sejnowski, The ‘Independent Components’ of natural scenes are edge filters, Vision Res. 37 (1997) 3327–3338.
ARTICLE IN PRESS P.R. Oliveira, R.A.F. Romero / Neurocomputing 71 (2008) 2180–2193 [5] S.M. Bhandarkar, J. Koh, M. Suk, Multi-scale image segmentation using a hierarchical self-organizing feature map, Neurocomputing 14 (1997) 241–272. [6] J.-F. Cardoso, B. Laheld, Equivariant adaptive source separation, IEEE Trans. Signal Process. 45 (1997) 434–444. [7] G.A. Carpenter, S. Grossberg, A massively parallel architecture for a self-organizing neural pattern recognition machine, Comput. Vision Graphics Image Process. 37 (1997) 54–115. [8] G.A. Carpenter, S. Grossberg, D.B. Rosen, Fuzzy ART: fast stable learning and categorization of analog patterns by an adaptive resonance system, Neural Networks 4 (1991) 759–771. [9] H. Cheng, C.A. Bouman, Multiscale Bayesian segmentation using a trainable context model, IEEE Trans. Image Process. 10 (2001) 511–525. [10] H.D. Cheng, X.H. Jiang, Y. Sun, J. Wang, Color image segmentation: advances and prospects, Pattern Recognition 34 (2001) 2259–2281. [11] P. Comon, Independent component analysis—a new concept?, Signal Process. 36 (1994) 287–314. [12] D. de Ridder, J. Kittler, R.P.W. Duin, Probabilistic PCA and ICA subspace mixture models for image segmentation, in: Proceedings of BMVC ’2000, Bristol, 2000, pp. 112–121. [13] S. Derrode, G. Mercier, W. Pieczynski, Unsupervised multicomponent image segmentation combining a vectorial HMC model and ICA, in: Proceedings of ICIP ’2003, 2003, pp. 14–17. [14] R. Duda, P. Hart, Pattern Classification and Scene Analysis, Wiley, New York, 1973. [15] M. Egmont-Petersen, D. de Ridder, H. Handels, Image processing with neural networks—a review, Pattern Recognition 35 (2002) 2279–2301. [16] T. Frank, K-F. Kraiss, T. Kuhlen, Comparative analysis of fuzzy ART and ART-2A network clustering performance, IEEE Trans. Neural Networks 9 (1998) 544–559. [17] A. Hyva¨rinen, J. Karhunen, E. Oja, Independent Component Analysis, Wiley, New York, 2001. [18] A. Hyva¨rinen, P. Hoyer, E. Oja, Image denoising by sparse code shrinkage, in: S. Haykin, B. Kosko (Eds.), Intelligent Signal Processing, IEEE Press, 2001. [19] T. Lee, M. Girolami M, T.J. Sejnowski, Independent component analysis using an extended infomax algorithm for mixed sub-Gaussian and super-Gaussian sources, Neural Comput. 11 (1999) 409–433. [20] T. Lee, M.S. Lewicki, T.J. Sejnowski, ICA mixture models for unsupervised classification of non-Gaussian classes and automatic context switching in blind signal separation, IEEE Trans. Pattern Anal. Mach. Intell. 22 (2000) 1078–1089. [21] C. Lu, H. Liao, Oblivious cocktail watermarking by sparse code shrinkage: a regional- and global-based scheme, in: Proceedings of IEEE International Conference on Image Processing, Vancouver, 2000, pp. 13–16. [22] T. Masters, Advanced Algorithms for Neural Networks, Wiley, New York, 1995. [23] R. Nathan, Spatial frequency filtering, in: B.S. Lipkin, A. Rosenfeld (Eds.), Picture Processing and Psychopictorics, Academic Press, New York, 1970, pp. 151–164. [24] T.A. Nodes, N.C. Gallagher, Median filters: some manipulations and their properties, IEEE Trans. Acoust. Speech Signal Process. 30 (1982) 739–746.
2193
[25] P.R. Oliveira, R.A.F. Romero, Enhanced ICA mixture model for unsupervised classification, in: Proceedings of IBERAMIA ’2004, Advances on Artificial Intelligence, Springer, Mexico, 2004, pp. 205–214. [26] P.R. Oliveira, J.F. Vicentin, R.A.F. Romero, Combining fuzzy ART network clustering and sparse code shrinkage for image segmentation, in: Proceedings of ICONIP ’2002, vol. 5, Singapore, 2002, pp. 2435–2439. [27] B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge, 1996. [28] C.A. Shah, M.K. Arora, S.A. Robila, P.K. Varshney, ICA mixture model based unsupervised classification of hyperspectral imagery, in: IEEE Proceedings of 31st International Workshop on Applied Imagery and Pattern Recognition Workshop, Washington, 2002, pp. 29–35. [29] B. Szatma´ry, G. Szirtes, A. Lo¨rincz, Robust hierarchical image representation using non-negative matrix factorisation with sparse code shrinkage preprocessing, Pattern Anal. Appl. 6 (2003) 194–200. [30] J.S. Walker, A Primer on Wavelets and their Scientific Applications, Chapman & Hall, London, 1999. [31] S.M. Weiss, C.A. Kulikowski, Computer Systems that Learn, Morgan Kaufmann, Los Altos, CA, 1991. [32] M. Zhang, L.O. Hall, D.B. Goldgof, A generic knowledge-guided image segmentation and labeling system using fuzzy clustering algorithms, IEEE Trans. Syst. Man Cybern. 2 (2002) 571–582. Patricia Rufino Oliveira obtained her Master’s degree in Computer Science from the Institute of Mathematical and Computer Science of University of Sao Paulo in 1997 in Artificial Intelligence. She received her Ph.D. degree from the same institute in 2004. Nowadays, she is an Assistant Professor in Information Systems at School of Art, Sciences and Humanities of University of Sao Paulo, and member of the Artificial Intelligence Research Group at this institute. Her research interests are in computational vision and machine learning techniques. Roseli Aparecida Francelin Romero received her Ph.D. degree in electrical engineering from the University of Campinas, Brazil, in 1993. She is an Associate Professor in Department of Computer Science at the University of Sao Paulo since 1988. From 1996 to 1998, she was a Visiting Scientist at Carnegie Mellon’s Robot Learning Lab, USA. Her research interests include artificial neural networks, machine learning techniques, fuzzy logic, robot learning, and computational vision. She has been reviewer for important journals and for several International and National Conferences of her area. She has already organized Special Sessions in important conferences and organized important events in Brazil. Dr. Romero is a member of INNS—International Neural Networks Society and Computer Brazilian Society (SBC).