Segmentation of color images using a two-stage self-organizing network

Segmentation of color images using a two-stage self-organizing network

Image and Vision Computing 20 (2002) 279±289 www.elsevier.com/locate/imavis Segmentation of color images using a two-stage self-organizing network S...

1MB Sizes 3 Downloads 16 Views

Image and Vision Computing 20 (2002) 279±289

www.elsevier.com/locate/imavis

Segmentation of color images using a two-stage self-organizing network S.H. Ong*, N.C. Yeo, K.H. Lee, Y.V. Venkatesh, D.M. Cao Department of Electrical and Computer Engineering, National University of Singapore, 10 Kent Ridge Crescent, Kent Ridge, Singapore, Singapore, 119260 Received 20 January 2000; received in revised form 5 August 2001; accepted 10 January 2002

Abstract We propose a two-stage hierarchical arti®cial neural network for the segmentation of color images based on the Kohonen self-organizing map (SOM). The ®rst stage of the network employs a ®xed-size two-dimensional feature map that captures the dominant colors of an image in an unsupervised mode. The second stage combines a variable-sized one-dimensional feature map and color merging to control the number of color clusters that is used for segmentation. A post-processing noise-®ltering stage is applied to improve segmentation quality. Experiments con®rm that the self-learning ability, fault tolerance and adaptability of the two-stage SOM lead to a good segmentation results. q 2002 Elsevier Science B.V. All rights reserved. Keywords: Color image segmentation; Self-organizing map; Color clustering; Arti®cial neural network

1. Introduction In this paper, we propose a method for color image segmentation by a two-stage hierarchical arti®cial neural network (ANN) based on the Kohonen self-organizing map (SOM). The motivation for this architecture arose from the need to develop a strategy for color image segmentation for which the amount of training data may be limited. The ®rst layer, containing an area array, deals with unsupervised grouping of input color image data, and the second, which is a linear array, is used mainly for labeling the clusters using the limited a priori knowledge of scenes. The usual methods for image segmentation include edge detection, region growing, region splitting and merging, histogram thresholding and clustering. Edge detection and histogram thresholding [1] methods work well with gray level images, but may not be suitable for color images because the color components should not be processed separately in view of their interdependence. Neighborhood-based methods such as region growing use only local information, while global methods such as feature space clustering [1] do not take advantage of local spatial knowledge. Some hybrid approaches have been proposed in the literature. They combine global and local information but are sensitive to noise or are computationally expensive. Efforts to combine region growing and clustering [3] have not been * Corresponding author. Tel.: 165-874-2245; fax: 165-779-1103. E-mail address: [email protected] (S.H. Ong).

entirely successful because of the heavy computational requirements needed by the clustering algorithm. For example, Liu and Yang [4] use scale space ®lter (SSF) and a Markov random ®eld (MRF) model in their segmentation scheme. However, the SSF is histogram-based and cannot provide reliable a priori knowledge for the MRF model. Taylor and Lewis [5] show how the initial splitand-merge and region growing processes are improved after several iterations of the relaxation procedure; unfortunately, the algorithm either yields coherent regions with holes or less coherent regions. Compared with these classical methods, the ANN approach has the advantage of parallel processing (with appropriate hardware), robustness, noise tolerance, and adaptability. Networks for three types of classi®cation have been employed: supervised, unsupervised and a combination of the two. Unsupervised learning is often preferable to supervised learning because the latter requires a set of training samples, which may not be practical in many applications, for instance, in the segmentation of color images. Although the Hop®eld [6] and back-propagation [7] neural networks have been used for the classi®cation of gray level images, self-organizing algorithms provide a framework for non-supervised classi®cation, the output of which can be used for a controlled training of the next layer network. Lampinen and Oja [8] have proposed a multi-layer SOM, HSOM, as an unsupervised clustering method. Analogues to multi-layer feed-forward networks, the HSOM (i) forms arbitrarily complex clusters, (ii) provides

0262-8856/02/$ - see front matter q 2002 Elsevier Science B.V. All rights reserved. PII: S 0262-885 6(02)00021-5

280

S.H. Ong et al. / Image and Vision Computing 20 (2002) 279±289

a natural measure for the distance of a point from a cluster by giving appropriate weights to all the points belonging to the cluster, and (iii) produces clusters that better match the desired classes than the direct SOM or the classical k-means or ISODATA algorithms. Traven [9] has investigated the application of a competitive learning algorithm to statistical pattern classi®cation using both local spectral and contextual features, but it is a supervised learning procedure in which the image must ®rst be manually segmented. An ANN, based on the idea of preserving the topology of the original input data set (mimicking the performance of the human brain), was ®rst proposed by Kohonen [10], and is usually called the SOM network. Unlike simple competitive learning methods in which only the winning neuron is allowed to learn, the neurons in the neighborhood of the winning neuron also participate in the learning process, leading to an ordered feature mapping that can be exploited in many application (e.g. color clustering in color images). Ghosal and Mehrotra [11] describe a Kohonen selforganizing feature map for segmenting range images using local information provided by the orthogonal Zernike moments. However, the application of their algorithm is limited to planar and one-dimensional (1D) quadratic surface patches of gray level images. Papamarkos [12] proposed an approach for color reduction that uses both the image color components and local image characteristics to feed a 1D Kohonen self-organizing feature map. The limitation of this method is that the number of ®nal colors has to be speci®ed a priori. Uchiyama and Arbib[13] employ competitive learning as a tool for color image segmentation. After demonstrating the equivalence of vector quantization and cluster-based techniques, they apply their algorithm to gray scale and color images. The ®nal results are essentially no different from those obtained by clustering. In the conventional single-stage network, the number of clusters in the ®nal segmented image is very much dependent on the number of neural units in the competitive layer, but it is usually not possible to determine a priori the appropriate number of clusters to be used in the segmented image. This signi®cant shortcoming is overcome in our approach, in which a hierarchical two-stage self-organizing network is employed as a pattern classi®er to group the output neurons into subsets, each of which corresponds to a discrete class of color. The number of clusters formed can thus be controlled, and this, in turn, limits the number of colors that will appear in the segmented image. Note that the ®rst stage provides a preliminary clustering of the input color image data in an unsupervised mode, thereby obviating the need for a large number of training samples that may not be available in practice. Thus, an image segmented from the clusters in the 1D map (of the second stage self-organizing network) is no longer dependent on the number of neural units in the two-dimensional (2D) feature map. The second-stage

network, which is much smaller in size, is subjected to training on the basis that only the prominent colors are selected using global information, and the segmentation is performed according to these colors. This is followed by the post-processing stage, which incorporates local information for a complete segmentation. The organization of the rest of the paper is as follows. Section 2 provides an explanation on the choice of color space used. Section 3 brie¯y describes the structure of the basic SOM. Section 4 presents a detailed description of network training with respect to initialization, learning rule and convergence criterion. In Section 5, we describe the process of feature clustering in the main processing stage and the color discrimination and segmentation techniques used in post-processing. Simulation results are presented in Section 6, and the conclusion in Section 7.

2. Choice of color space Colors are perceived as combinations of the three primary colors, red (R), green (G), and blue (B). The attributes generally used to distinguish one color from another are brightness, hue, and saturation. There are several standard color spaces that are widely used in image processing, e.g. RGB, CMY, HSI and YIQ. These and many others can be calculated from the tristimuli R, G, B by appropriate transformations. However, these models are not uniform color spaces [14]. In L pu pv p color space, u p and v p represent color chromaticity and L p the intensity. The use of a uniform color space such as L pu pv p or L pa pb p is recommended for good performance in color clustering because the difference between two colors can be simply measured by their Euclidean distance. In our application to color image segmentation, the L pu pv p space has the property of an `approximately uniform perceptual space' [15] and is preferred because it is associated with a chromaticity diagram in which an additive mixture of two arbitrary colors lies on the straight line joining the two colors [14]. According to Tominaga [15], who uses the L pa pb p space to detect clusters in color images (in a non-neural-based framework), a perceptually uniform color space should be used if we try to mimic human performance. In the proposed algorithm, this property of the L pu pv p space is used to compute the mean value of color in cluster merging operations. In pre-processing, the raw data of the original image are converted into L pu pv p values by the CIE standard formula. First, the RGB data are transformed into the CIE XYZ tristimulus values: 2 3 2 X 0:607 6 7 6 6 Y 7 ˆ 6 0:299 4 5 4 Z

0:000

0:174 0:200

32

R

3

76 7 6 7 0:587 0:114 7 54 G 5 0:066 1:116

B

…1†

S.H. Ong et al. / Image and Vision Computing 20 (2002) 279±289

The L pu pv p values are then 8   Y > > 903:3 if > < Y0 p L   > Y > > 2 16 if : 25 100 Y0

281

given by: Y , 0:008856 Y0 Y $ 0:008856 Y0

…2a†

up ˆ 13Lp …u 0 2 u 00 †

…2b†

vp ˆ 13Lp …v 0 2 v 00 †

…2c†

where u0 ˆ

4X …X 1 15Y 1 3Z†

…2d†

n0 ˆ

9Y …X 1 15Y 1 3Z†

…2e†

The quantities Y0 ˆ 225; u 00 ˆ 0:0200953 n 00 ˆ 0:460900 are obtained by substituting the tristimulus values X0, Y0, Z0 for the reference white. 3. Structure of the self-organizing network The proposed model has two stages, each of which is a self-organizing network that is abbreviated as an SOM network (Fig. 1). The input vector of dimension n, x ˆ ‰x1 ; x2 ; ¼; xn ŠT [ Rn ; is connected in parallel to all the neurons in the output layer of the single layer neural network. The weight vectors associated with the neurons in the output layer are denoted by wi ˆ ‰wi1 ; wi2 ; ¼; win ŠT [ Rn : The weight vector for a cluster unit serves as an exemplar of the input patterns associated with that cluster. This forms the feature map that will be organized during training. The neuron whose weight wc is closest to x is declared the `winner'. The winner and all the neurons within a de®ned neighborhood set Nc will be updated with respect to the input pattern via a learning rule. Among the few geometrical structures that could be used to construct the 2D array, the square and the hexagonal are the practical possibilities. Note that in a square-array neuronal structure, each neuron has eight neighbors and the distance to the diagonal elements are (mathematically) different from the distance to the other neighbors, leading to lack of uniformity in the computation of the in¯uence of the neighborhood on the neuron. In contrast, in the hexagonal array, each neuron has six equidistant neighbors, making the array mathematically attractive. It is further known that regular hexagonal sampling of images produces optimum data for a given information content [16]. Staunton and Storey [17] show that the standard square array (image) processing operators can be recast such that they are `computationally more ef®cient, and as accurate, as their square counterparts'. It has been found, however, that the improvements in performance obtained with the hexagonal

Fig. 1. Structure of the 2-stage SOME neural network.

array are marginal; therefore, for simplicity, we have used the square array in our simulation. There are 256 output neurons arranged in a 16 £ 16 grid to form the 2D topological feature map. This size is a compromise choice in that it makes it possible for some distinctive colors occurring infrequently to be sampled in the initialization phase and also allows for ease of convergence. The second stage is 1D and contains 20 neurons, which implies that the 256 outputs of the ®rst stage are now reduced to 20. In the ®rst stage, pixels of the original image, which have components L pu pv p, are sequentially fed into the network via the input neurons to train the output neurons iteratively via competitive learning. The algorithm employs a 2D map in an attempt to preserve the attributes of the 3D color features. After training, the weight vectors appear in a spatially organized map that contains groups of typical colors. The dominant colors are obtained by using the colors of the

Fig. 2. Example of the initial state of the 2-D SOM neural network.

282

S.H. Ong et al. / Image and Vision Computing 20 (2002) 279±289

trained neurons of the ®rst stage to train the second stage. This leads to a 1D map in which the number of colors has been reduced from 256 to 20 in our model. It is possible to automatically determine an appropriate size of the 1D map by using a suitable criterion function such as winning frequency. 4. SOM network training Both the stages of the SOM network share the same training principle. Essentially, the neuron weights are initialized before training, the wining neuron and its neighbors within a radius of Nc are updated via a learning rule, and learning continues until a convergence criterion is satis®ed. After the original RGB data are transformed into L pu pv p color space, the SOM network is applied to the ®rst stage in order to detect the dominant colors in the image. 4.1. Initialization

Fig. 3. 2-D colour feature map after training.

learning rule:

In any clustering algorithm whose initialization depends on the set of initial cluster centers, convergence may lead to a local minimum that is non-global. Thus, a condition imposed on the SOM is that the neuron weights should be different during initialization, which should also expedite learning. Random initialization is commonly employed in SOM networks for the reason that the initially unordered vectors will become ordered eventually, typically in a few hundred steps. However, this does not mean that random initialization is necessarily the best policy. In order to facilitate convergence to (color) vectors that are close to the vectors typical of the colors in the images, the neuron weights are initialized by colors that are randomly sampled from the input image itself (Fig. 2). This essentially implies that the probability density of the initialization vectors approximates that of the input samples. The color components of the images are then fed to the ®rst stage of the network for training the neurons, so that their weights are organized according to the color characteristics of the image.

wi …k 1 1† ˆ 8 wi …k† 1 a…k†‰x…k† 2 wi …k†Š > > < wi …k† 1 b…k†‰x…k† 2 wi …k†Š > > : wi …k†

if i [ Nc and i ˆ c if i [ Nc and i ± c

…5†

if i Ó Nc

where a (k), b (k) are the respective learning rates of the winning neuron and its neighbors at the kth iteration, and 0 , a (k), b (k) , 1. It is found that `an accurate learning rate is not important' [12]. It can be linear, exponential or inversely proportional to k. Here, we de®ne a learning rate

4.2. Learning rule The SOM is based on competitive learning. In color image segmentation, the output neurons with weights wi ˆ ‰wi1 ; wi2 ; wi3 ŠT compete with each other to ®nd a best match with the input pattern x ˆ ‰x1 ; x2 ; x3 ŠT : A widely used measure for the match of x with wi is the Euclidean distance between the two, given by: ix 2 wi i ˆ {…x1 2 wi1 †2 1 …x2 2 wi2 †2 1 …x3 2 wi3 †2 }1=2 …3† The neuron C whose weights vector wc is closest to x is declared the winner, i.e. ix 2 wc i ˆ min {ix 2 wi i} i

…4†

The winner and all the neurons within Nc are updated by the

Fig. 4. Typical plot of change in neuron weights wij , against the iteration number.

S.H. Ong et al. / Image and Vision Computing 20 (2002) 279±289

283

Fig. 5. Coarse segmentation examples: (a) original ªFruitsº image with 58930 colours; (b) segmented ªFruitsº image using 20 £ 20 map; (c) original ªObjectsº image; (d) segemnted ªObjectsº image using 16 £ 16 map; (e) original ªPlantsº image (f) segmented ªPlantsº image using 16 £ 16 map.

that decreases monotonically with time: 1st stage : a…k† ˆ a0 £ 1:52k ; 2nd stage : a…k† ˆ a0 =k3 ;

b…k† ˆ b0 £ 1:52k ; b…k† ˆ b0 =k3

…6†

where a 0, b 0 are the initial values, and b 0 , a 0. Nc is chosen such that it is fairly large in the beginning and reduces monotonically to a small value or to the winning neuron by itself. Based on the chosen initialization and learning rule, the map of Fig. 3 is obtained. In the basic SOM learning rule, the winning frequency of each neuron has no effect on the learning rate. However, Springub et al. [18] have implemented an individual learning rate a for each neuron that depends on the winning frequency, so that the feature vectors with low frequencies

will remain in the map as well as those with high frequencies. It can be deduced that with this scheme, the map will contain some insigni®cant colors that do not need to be retained in segmentation. Moreover, the distribution of colors on the map, which approximates the probability density of the input patterns, will be destroyed. 4.3. Convergence criterion Learning is a stochastic process [10], and the ®nal accuracy of the mapping depends on the number of steps, which must be reasonably large, e.g. 100,000. With a ®xed number of steps, the neurons may not be well trained or may need an excessively long time for training. In our scheme, all the pixels of the given color image are utilized to train the

284

S.H. Ong et al. / Image and Vision Computing 20 (2002) 279±289

randomly) for training and, as a consequence, the dominant colors in the 2D map will be represented by neurons in the 1D neuron array. In this way, the most representative colors are extracted from the original image, resulting in its coarse segmentation after labeling each pixel by these colors. Three illustrative example are shown in Fig. 5. It is found that segmentation is generally well done even though the segmented images contain some (small) noisy patches. Cluster merging can reduce the ®nal number of colors represented by the neuron weights of the second stage (on completion of training). Pairs of colors may be merged if they are suf®ciently similar. It should be mentioned that the discarding/merging procedure is optional, depending on the image and the application. The Euclidean distance between colors is used to determine if two colors are suf®ciently close so that the clusters of pixels they represent can be merged. The Euclidean distance between every pair of neurons is calculated by Dij ˆ {…wi1 2 wj1 †2 1 …wi2 2 wj2 †2 1 …wi3 2 wj3 †2 }; Fig. 5. (continued)

network, each cycle is an iteration of N £ M steps for an N £ M image, and the learning rate decreases with the number of iterations. After a suf®ciently long number of iterations, the learning rate becomes so small that the neuron weights change by a very small amount each time. Convergence of the network is checked with respect to every neuron; if the change in output weights between two iterations is less than a small positive constant e , it is considered to have converged, and the training is stopped. Experiments have shown that this kind of convergence is effective for the SOM network. It is found that neuron weights usually change during the early stages and sometimes oscillations occur, but they tend to converge after several iterations of training (Fig. 4). 5. Segmentation based on clustering The output at the end of the unsupervised learning of the ®rst stage of the network is a coarsely segmented image that is further processed to re®ne the segmentation. The weights of the neurons in the ®rst-stage network tend to approximate the density function of the vector inputs obtained from the color image in an orderly manner, i.e. the weight vectors that are spatially closer in the organized map correspond to higher density input regions in the input image. We are able to identify the dominant color clusters that are present in the image from the map. For example, different shades of red are grouped together, changing gradually to another group of colors (Fig. 5). However, the number of colors in the input of the ®rst stage of the network is still large, e.g. 256 in a 16 £ 16 network. The output of the ®rst stage is fed to a 1D array of neurons (with weights initialized

0 , i; j , N2

…7†

where N2 is the number of neurons in the second stage. The mean of these distances is N2 1 X D ˆ D N2 i;jˆ1 ij

…8†

from which we obtain the threshold value, DT ˆ hD

…9†

where h is a threshold factor. Colors represented by two neurons are merged if their color distance is less than DT, i.e. Dij , DT

…10†

The neuron weights obtained after merging are used to segment the image into `color-homogeneous' regions. Each pixel of the original image is again input to the network, and it is labeled with the color of the winning neuron. Fig. 6 shows the results obtained with different values of the threshold factors. It is evident that clustering has improved. The segmented images will, however, often appear noisy due to varying illumination and texture, and post-processing may be needed for better segmentation quality. 5.1. Post-processing Coarse segmentation is essentially based on ®rst-order statistics that describe the pixel population without regard to its spatial distribution. Segmentation is more reliable when the decision involves the local pixel region context rather than just the individual pixel. A post-processing stage is often found to be necessary to produce coherent regions. A variety of spatial ®ltering techniques, e.g. the median and max ®lters, may be applied. The median ®lter is used here in order to illustrate a viable post-processing technique (Fig. 7).

S.H. Ong et al. / Image and Vision Computing 20 (2002) 279±289

285

Fig. 6. Final segmentation results with different values of the threshold factory h : (a) segmented ªFruitsº image using 20 £ 20 map; at h ˆ 0:3 with 11 colours; (b) segmented ªFruitsº image using 20 £ 20 map; at h ˆ 0:1 with 17 colours

Median ®ltering is particularly effective in forcing points with distinct colors to be more like their neighbors by eliminating isolated colors. Most of the misclassi®ed pixels are removed, thereby improving the segmentation quality. As shown in the ®ltered image of Fig. 7(b), the segmented regions are relatively uniform.

6. Results and discussion The proposed algorithm has been simulated on a SUN Ultra 30 workstation and applied to a variety of color images including those of geometrical objects, natural scenes and human faces, as well as synthetic images with noise. Segmentation results, typical of which are shown in

Figs. 6 and 7, con®rm that the method is suitable for a wide range of images. The segmentation process involves four parameters that can be adjusted to suit different applications. A high degree of ¯exibility can be achieved by tuning these parameters. ² Size of 2D map. A range of map sizes from 16 £ 16 to 20 £ 20 is found to be suitable for a variety of images. The factors to be considered in the determination of the exact map size are image complexity, required segmentation quality, and available computational resources. A highly detailed image generally requires a larger map size to produce good segmentation results (Fig. 8(b)) as compared to a simpler image (Fig. 8(d)). A map that is too small often does not lead to a global optimum after training and removes too much information, as can be

286

S.H. Ong et al. / Image and Vision Computing 20 (2002) 279±289

Fig. 7. Post-process ®ltering results for ªFruitsº: (a) ®nal segmented image (h ˆ 0:25 with 12 colours); (b) after median ®ltering.

seen in Fig. 8(a) and (b), where an 8 £ 8 grid results in poor segmentation. On the other hand, a large map is computationally demanding, e.g. a 30 £ 30 grid takes about 350 s to reach convergence, compared to 131 s for a 16 £ 16 grid (Table 1). ² Size of 1D map. This can be adjusted according to the complexity of the image. ² Convergence criterion e . A value of 0.1 generally gives good results without requiring excessive computational time (Table 1). ² Threshold factor h . This determines the degree of segmentation. Using the image `Fruits` as an illustration, h ˆ 0:3 results in a coarser segmentation (Fig. 6(a)), while with h ˆ 0:1 more details of the image are present (Fig. 6(b)). This ¯exibility is important because, in some applications, only a small number of clusters, corresponding to the most prominent

colors, are needed, while in other applications, ®ner details may be desired. The salient characteristics of the algorithm are now discussed. ² Initialization. The network is initialized by colors randomly sampled from the original image. Although the sampling sequence is different each time, the maps of the ®rst stage after convergence are very similar. Extensive experimentation has shown that 2D SOM learning with a grid of 16 £ 16 converges to an approximately global optimum. ² Coarse segmentation. Experiments have indicated that coarse segmentation is necessary. Final segmentation is completed based on the result of coarse segmentation and not by referring to the original image. Since the dynamic range of the data is large, direct matching between the

S.H. Ong et al. / Image and Vision Computing 20 (2002) 279±289

287

Fig. 8. Coarse segmentation results using different map sizes: (a) segmented ªFruitsº image using 8 £ 8 map; (b) segmented ªFruitsº image using 20 £ 20 map; (c) segmented ªObjectsº image using 8 £ 8 map; (d) segmented ªObjectsº image using 16 £ 16 map.

pixels of the original image and the very few colors left after color merging would result in loss of detailed information, leading to poor segmentation quality. ² Two-stage clustering. The two-stage processing scheme, Table 1 Computational load for different map sizes and values of e (mean results obtained from a group of 10 images) Map size

e

Number of iterations

Training time (s)

8£8 16 £ 16 16 £ 16 20 £ 20 30 £ 30

0.001 0.001 0.1 0.1 0.1

45 41 27 25 23

162 304 131 169 350

which uses a 2D map followed by a 1D map, is crucial for successful color clustering. Experiments have shown that it is very dif®cult to extract the colors accurately if a 1D map is used. On the other hand, the number of clusters cannot be precisely controlled if only a 2D map is used. By utilizing two-stage processing, we can overcome these problems and successfully extract the dominant colors. ² In¯uence of illumination. By using L pu pv p color space, the in¯uence of illumination is signi®cantly reduced. However, there are still situations in which object surfaces that are strongly highlighted or in shadow may lose all color information, and the sub-regions cannot then be merged [19]. For example, no matter how large the values of u p and v p, when L p has a very small value, the pixel loses

288

S.H. Ong et al. / Image and Vision Computing 20 (2002) 279±289

its color and appears black. On the other hand, re¯ection highlights lead to the situation where the pixels appear totally white. When such sub-regions are present, they are labeled with the wrong colors. ² Comparison with the relevant results of the literature [2±6,18,19]. It is found that any quantitative comparison with the segmentation results obtained from other methods is impossible for lack of a satisfactory measure (for comparing ®nal segmentation outputs). However, the proposed method is believed to be conceptually simpler and computationally faster than most of the techniques found in the literature on color image segmentation.

post-processing scheme utilizes image spatial information to successfully transform a segmented image with small incoherent regions into one with uniform regions. Illustrative examples are given to demonstrate that the two-stage network is robust and gives good results.

It should be added, however, that the proposed algorithm gives generally good segmentation results on natural color images but does have some limitations:

References

(i) Re¯ection highlights cannot be completely removed. (ii) Highly textured images may give rise to signi®cant segmentation error. However, it is interesting to note here that, in a recent paper [20], Gabor features are employed for comparing L pu pv p and RGB color spaces. An application of the proposed hierarchical SOM network to color textures, using Gabor features as inputs, could provide a new strategy for (color) texture segmentation. (iii) The colors of some small but distinct regions are detected in the second stage, but fortunately the misclassi®ed pixels can be removed by post-processing. (iv) Although the SOM network may be improved by a better learning rule for training, there is no appropriate criterion to be optimized for an algorithm to search for a global optimum. However, system performance may be further improved through ®ne-tuning of the map by learning vector quantization (LVQ) methods [12] in which supervised learning is required.

7. Conclusions We have proposed a two-stage SOM-based network that combines the advantages of unsupervised learning based on large data sets and labeling of the clustered outputs (obtained from unsupervised learning) using available limited a priori knowledge. The network detects, in a given color image, the dominant colors that are subsequently used to segment the image by pixel classi®cation. In comparison with the standard, single-stage SOM-based network, the proposed network is highly robust and ¯exible in that the dominant colors are reassuringly captured, and the ®nal number of clusters formed may be adaptively determined or ®xed. The

Acknowledgements The authors wish to express their grateful thanks to the referees for their valuable comments and suggestions for improvement.

[1] R. Ohlander, K. Price, D.R. Pierson, Picture segmentation using a recursive region splitting method, Computer Graphics and Image Processing 8 (1978) 313±333. [2] M. Celenk, Colour image segmentation by clustering, IEEE Proceedings on Computers and Digital Techniques 138 (1991) 368±376. [3] B. Cramariuc, M. Gabbouj, J. Astola, Clustering based region growing algorithm for color image segmentation, Proceedings of the 13th International Conference on Digital Signal Processing, Santorini, Greece, July 1997, pp. 857±860. [4] J.Q. Liu, Y.H. Yang, Multiresolution color image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 16 (1994) 689±700. [5] R.I. Taylor, P.H. Lewis, Color image segmentation using boundary relaxation, Proceedings of the 11th IAPR International Conference on Pattern Recognition, The Hague, Netherlands, August 1992, pp. 721± 724. [6] P. Campadelli, D. Medici, R. Schettini, Color image segmentation using Hop®eld networks, Image and Vision Computing 15 (1997) 161±166. [7] D. Rumelhart, G. Hinton, R. Williams, Learning internal representation by error propagation, Institute for Cognitive Science Report 8506, University of California, San Diego, 1985. [8] J. Lampinen, E. Oja, Clustering properties of hierarchical self-organizing maps, Journal of Mathematical Imaging and Vision (1992) 261±272. [9] H.G.C. Traven, A neural network clustering algorithm, and its application to multispectral satellite image classi®cation, Proceedings of the Sixth Scandinavian Conference on Image analysis, Oulu, Finland, June 1989, pp. 128±135. [10] T. Kohonen, The self-organizing map, Proceedings of the IEEE 78 (1990) 1464±1480. [11] S. Ghosal, R. Mehrotra, Application of neural networks in segmentation of range images, Proceedings of the International Joint Conference on Neural Networks, Baltimore, USA, June 1992, pp. 297±302. [12] N. Papamarkos, Color reduction using local features and a SOFM neural network, International Journal Imaging Systems and Technology 10 (1999) 404±409. [13] T. Uchiyama, M.A. Arbib, Color image segmentation using competitive learning, IEEE Transactions on Pattern Analysis and Machine Intelligence 16 (1994) 1197±1206. [14] A.R. Robertson, The CIE 1976 color-difference formulae, Color Research and Application 2 (1977) 7±11. [15] S. Tominaga, Color classi®cation of natural color images, Color Research and Applications 17 (1992) 230±239. [16] R.M. Mersereau, The processing of hexagonally sampled two-dimensional signals, Proceedings of IEEE 67 (1992) 930±949. [17] N. Storey, R.C. Staunton, An adaptive pipeline processor for real-time image processing, Proceedings of SPIE Conference on Automated

S.H. Ong et al. / Image and Vision Computing 20 (2002) 279±289 Inspection and High-speed Vision Architectures 3, vol. 1197, Philadelphia, USA, November 1989, pp. 238±246. [18] A. Springub, D. Scheppelmann, H. P. Meinzer, Segmentation of multisignal images with Kohonen's self-learning topological map, Proceedings of the Fourth International Conference on Computer Analysis of Images and Patterns Dresden, Germany, September 1991, pp. 148±152.

289

[19] D.C. Tseng, C.H. Chang, Color segmentation using perceptual attributes, Proceedings of the 11th International Conference on Pattern Recognition, The Hague, Netherlands, August 1992, pp. 228±231. [20] G. Paschos, Perceptually uniform color spaces for color spaces texture analysis: an empirical evaluation, IEEE Transactions on Image Processing 10 (2001) 932±937.