Unsupervised image segmentation using a distributed genetic algorithm

Unsupervised image segmentation using a distributed genetic algorithm

Pergamon Pattern Recognition, Vol. 27, No. 5, pp. 659 673, 1994 Elsevier Science Ltd Copyright © 1994 Pattern Recognition Society Printed in Great Br...

5MB Sizes 2 Downloads 116 Views

Pergamon

Pattern Recognition, Vol. 27, No. 5, pp. 659 673, 1994 Elsevier Science Ltd Copyright © 1994 Pattern Recognition Society Printed in Great Britain. All rights reserved 0031 3203/94 $%00+.00

0031-3203(93) E0022-Y

UNSUPERVISED IMAGE SEGMENTATION USING A DISTRIBUTED GENETIC ALGORITHM PHILIPPE ANDREYand PHILIPPETARROUX~" Groupe de Biolnformatique, Laboratoire de Biochimieet Physiologiedu Drveloppement, URA 686 CNRS, Drpartement de Biologie,Ecole Normale Suprrieure, 46 rue d'Ulm, 75230 Paris Cedex 05, France (Received 8 April 1993; in revised form 26 October 1993; received for publication 19 November 1993)

Abstract--A new methodological approach to digital image processing applied to the particular case of gray-level image segmentation is introduced. The method is based on a modified and simplifiedversion of classifier systems. The labeling function is implemented as a spatially structured set of binary-coded production rules. The labeling is iteratively modified using a distributed genetic algorithm. Results are presented which illustrate both the mechanismsunderlyingthe functioningof the method and its performance on natural images. The relationships between this approach and other related techniques are discussed and it is shown that it compares favorably with these. Digital image processing Classifiersystems Unsupervised segmentation Clustering

Distributed genetic algorithms

l. INTRODUCTION Image segmentation is a fundamental process in picture processing and a lot of works have been devoted to the subject.{1-3) Classical methods can be divided into three main classes. First, segmentation can be seen as a clustering process: the basic assumption is that different regions on the image correspond to distinct clusters in a given feature space. The problem is then to determine and to distinctively label the clusters in the feature space and to map back this labeling onto the image, by giving a pixel the label of the cluster to which it belongs. One problem with such methods is that they do not take into account spatial interactions between neighboring pixels. Another approach to image segmentation is to determine edges between regions. The major problem is that resulting edges are not necessarily connected, and a post-processing is needed to close the boundaries between regions. The third class of methods is complementary to the previous one. Rather than detecting edges between regions, the idea is to detect the regions themselves.14~Two ways can be followed to achieve this goal (they can also be combined): the first one consists of starting with small regions (individual pixels) and letting these regions grow according to a merging rule based on a similarity criterion, while, on the contrary, the second way to operate starts with a single region and progressively splits it into smaller ones. Such methods are inherently sequential. It is well known that most of these classical methods are more or less ad hoc and specific to particular contexts. In fact, the requirement of a correct parameters f Author to whom correspondence should be addressed.

adjustment, the need of an a priori knowledge and the sequential characteristics of such methods are major drawbacks. Our work is an attempt to devise and investigate alternative methods which overcome some of these problems. The method we propose here is simple, robust, unsupervised and highly parallel. These properties result from the fact that it relies upon a distributed genetic algorithm. Genetic algorithms~5'6) are stochastic search methods, the functioning of which is inspired from genetics' laws, natural selection and evolution of organisms. Their main attractive characteristic is their ability to efficiently deal with hard combinatorial search problems, in particular by avoiding, through their parallel exploration of the search space, being stuck in local extrema. During the last decade, they have therefore become powerful alternatives to conventional optimization methods. To date, the few attempts that have been made to apply genetic algorithms to image segmentation use the genetic algorithm in a conventional way, that is to say, to solve a combinatorial optimization problem, such as the parameter setting of an existing image segmentation algorithm 17) or the search for geometric primitives in a binary imageJ s) As we shall see, we take a radically different approach, in the sense that the genetic algorithm is not used as an alternative optimization method but rather to automatically generate an image segmentation program. More specifically, we have implemented a system akin to a particular class of machine learning methods called classifier systems/9' t o}A classifier system is a set of subsymbolic production rules (with condition and action parts) called classifiers. The system evolves in an environment with which it exchanges messages via detectors (input messages) and effectors (output mes659

660

P. ANDREYand P. TARROUX

sages). A classifier, which can be viewed as a condition/ action rule, incorporates in the system the knowledge that if a condition is detected then the related action should be taken. When a message is received from the environment, it performs a parallel activation of the rules, the condition part of which matches this message. These rules then compete and the winning rule posts its own message (the action part), which in turn activates another set of rules. From time to time, a rule posts a message that is to be sent to the environment. The system may then receive a payoff depending on the relevance of this message. One problem in the functioning of classifier systems is the credit assignment problem, i.e. the distribution of payoff over the rules responsible for the obtained reward. However, the reward notion will not be relevant here due to the unsupervised functioning of our algorithm. Another problem is the generation of new rules. The mechanism responsible for this is a genetic algorithm, which replaces low quality rules by recombinations of useful ones. Classifier system functioning is domain-independent. This fundamental property resuits from the fact that all operations in the system act on subsymbolic representations of the rules. That is to say, each rule is represented as a string of symbols over a predefined alphabet (here {0, 1} ), in the same way as, e.g. the binary representation of numbers in computers. Thus, the mapping to a particular problem domain only requires the definition of the interfaces with environment, i.e. rule encoding and decoding functions. The segmentation process is a function taking an image as input and returning a label image as output. We propose to represent this labeling function using a set of classifiers. Under a distributed genetic algorithm, the representation of this function is iteratively modified until a function yielding a (possibly) accurate segmentation is found. Each pixel is attributed a classifier. The action of the latter represents the label that is assigned to the former in the output image. Its condition is a feature vector that is compared with the actual feature vector in the neighborhood of the pixel: the closer these two vectors are, the more activated the rule is. From generation to generation, rules that are poorly activated are eliminated and replaced by highly activated rules. Compared with standard classifier systems and genetic algorithms, our implementation is characterized by the following points: (i) each rule is mapped onto a pixel of the image, leading to a spatially organized set of rules; (ii) since there is neither reward nor reinforcement mechanism, the functioning of the system is unsupervised; (iii) rules do not chain in activation sequences: we use a one-step firing mechanism; (iv) the genetic algorithm is implemented in a distributed version. The paper is organized as follows: the next section describes the method, after a brief introduction to genetic algorithms, and particularly to distributed genetic algorithms. The third section deals with implementation details. In the fourth section, results obtained on simple artificial images are reported, which illustrate

different aspects of the method, such as initialization, selection and reproduction mechanisms. The performances of the algorithm on natural scenes are also illustrated. The final section is devoted to a discussion, and particularly to a comparison with related image segmentation techniques. 2. M E T H O D

2.1. Genetic algorithms The goal of a genetic algorithm is to find an optimal or near-optimal solution to a given problem (e.g. maximizing a function) through iterative modifications of a population of potential solutions. These solutions are coded as bit strings and called chromosomes. One iteration in the algorithm consists of three main steps, namely, evaluation, selection and mating. At the evaluation stage, each chromosome is assigned a fitness value, which represents its ability to solve the problem (if the problem is to maximize a function, the fitness is, for instance, the value of the function at the point coded by the chromosome). At the selection stage, chromosomes are picked according to their fitness in such a way that the better chromosomes are more likely to be selected. At the mating stage, selected chromosomes are recombined (by exchanging substrings) and mutated (by flipping some bits). Crossover and mutation are illustrated in Table 1. From generation to generation, this process leads to increasingly better chromosomes, and eventually, to near-optimal solutions. 2.2. Distributed genetic algorithms Alternatively to standard genetic algorithms, which deal with a single population whereupon a global selection is applied, and which are panmictic (each chromosome can potentially mate with any other one), some authors have introduced a structured organization in the distribution of the population. In that case, selection and mating take place within locally distributed subgroups of individuals rather than within the whole population, so that these mechanisms can be run independently and in parallel on each subgroup. There are two main tendencies. On the one hand, coarse-grained genetic algorithms consist of many different genetic algorithms running simultaneously Table 1. Genetic operators. In crossover, randomly chosen portions of chromosomes are exchanged. Here, a single point crossover is illustrated. In mutation, randomly chosen bits are flipped. Crossover and mutation rates govern the number of recombinations and of mutations occurring during mating, while positions are chosen at random Operator

Parents

Offspring

Crossover

101 0 011

101110

Mutation

001 1011

0010011 1010101

1

101 101

1

Unsupervised image segmentation

661

and periodically exchanging small fractions of populations.t,,.12~ On the other hand, fine-grained genetic algorithms consist of a single population which is mapped onto a grid whereupon selection and mating take place in the neighborhood of each chromosome. I~3 ~5~ The motivation for distributed and parallel implementations of genetic algorithms is two-fold: on the one hand, defining a more realistic approach to genetic algorithms (in nature, selection and reproduction are indeed local phenomena); on the other hand, efficiently exploiting parallel architectures (parallelism is interesting when communications between processors are minimized). The goal is then to increase the performances of genetic algorithms, improving both the quality of the final solutions and the speed of convergence.

There are two mating restriction mechanisms: (i) a chromosome can only mate with a neighboring one and (ii)a chromosome can only mate with a chromosome coding for the same label. As far as mating is concerned, the algorithm is therefore a kind of hybrid distributed genetic algorithm: it is coarse-grained since there is a label class membership constraint, and fine-grained since there is a neighborhood membership constraint. However, selection is purely fine-grained, since it is local (neighboring chromosomes compete with each others independently of the remainder of the population) but not label-constrained.

2.3. Outline of the method: a distributed genetic algorithm for image segmentation

3.1. Initialization

Image segmentation is an example of a task which can be viewed as a function taking an image as input (typically an intensity image) and producing another image as output (a label image). The method proposed in this work consists in representing this function as a spatially organized set of binary-coded production rules (which will be referred to as chromosomes in the following). The function is iteratively modified using a distributed genetic algorithm until the rule set encoding the--possibly--best function is found. At any time, there is one and only one chromosome on each pixel of the image. The action part a chromosome is coding for determines the label it bears, and, subsequently, the label assigned to the pixel on which it is lying. Note that if the action part is coded using b bits of a chromosome, there are 2b available labels. The condition part a chromosome is coding for is a feature vector which can be any set of values computable from the image. In this work, pixel value vectors have been used. The fitness of a chromosome is a function of the difference between the feature vector it is coding for and the actual feature vector computed at the location of the chromosome on the image. Consider for example the simple case, wherein the feature vector is a single pixel value: • the condition part of a classifier is then a graylevel value coded as a bit string. The length of the string equals the depth of the image to be processed (typically, 8); • the action part of a classifier is a label value coded as a bit string. This label is attributed to the pixel on which the classifier is lying; • the fitness of the classifier is a function of the difference between the value represented by its condition part and the actual value of the pixel on which the classifier is lying. In practice, feature vectors have more than one element: one can take the value ofa pixel and the values of some of its neighbors (as it is the case in this work), or some statistics (mean, standard deviation, etc.) locally computed, etc.

3. I M P L E M E N T A T I O N

In the following, a gene will refer to a substring of a chromosome, coding.either for its action part, or for an element of its feature vector. Two mechanisms have been implemented and tested for gene initialization. The first one is totally random (it will be referred to as RI in the following): it consists in chosing, for each gene of each chromosome, a random value in the range of allowed values for this gene. The second one is a two-step mechanism. Firstly, each chromosome is given a random label gene value uniformly chosen in the set ~q' of possible labels. Secondly, for each feature vector gene, the chromosomes of label l are assigned values according to a Gaussian distribution Jff(p(l), o). This mechanism will be denoted by GI. Under GI, distinct label classes have different mean values but same standard deviation values. Mean values (#(l))a~ and standard deviation value a are automatically computed so that distributions of different label classes do not overlap (this can be done because the range of the values returned by the Gaussian law generator is known a priori). Consider, for example, the case where the feature vector has two elements. Then, under initialization mode GI, initial label classes form clusters that are distributed along the first diagonal of the twodimensional feature space. 3.2. Algorithm The method consists in iteratively repeating the following three main steps, until a stopping criterion is met:

1. Evaluation. For each chromosome k, compute a distance between the actual feature vector x = (x 1,..., xp) measured on the image at its location and the vector y = (y, ..... yp) for which it is coding. The smaller this distance, the greater the fitness f(k) of the chromosome. In this application, the Manhattan distance was chosen p

f(k)=-

~ Ix,-y,I. i-I

Note that the fitness of a chromosome depends on its position (since it depends on x) and also that the best fitness score is 0.

662

P. ANDREYand P. TARROUX

2. Selection. Replace each c h r o m o s o m e by a neighbor (possibly itself), according to the fitness of the neighboring chromosomes. Two selection mechanisms have been implemented and tested. The first is an elitist selection scheme, also known as local tournament selection (LTS): t~6} the chromosome with the highest fitness value in the neighborhood i~ selected. The second scheme is the classical stochastic selection with replacement, or roulette-wheel selection (RWS),~6~and proceeds as follows. Let f ' be the l:esulting fitness function after mapping f onto a positive range (e.g. f ' = f - min (f)). Then each chromosome k of the neighborhood N has a probability P(k) to be chosen which is given by its relative fitness f'(k) P(k) -

Ec~Nf'(c)

3. Mating. For each chromosome, look for a neighbor coding for the same label (the search is not exhaustive: we select a neighbor at random until it bears the same label or until a maximal trial number has been reached). Recombine the current chromosome by this neighbor. Replace/he current chromosome by one of the two offspring. When no mate is found, the current chromosome remains in place. In any case, a mutation process, whereby some randomly selected bits are flipped, follows this recombination step. 3.3.

Stopping criterion

We define a stability criterion and a stopping criterion. At each time step, the system stability is given by the percentage of pixels on the segmented image that are assigned the label they had at the previous time step. At any given time step, the stability criterion is reached if the stability is above a stability threshold. The stopping criterion is reached when the stability criterion has been met for a predefined number of consecutive time steps, or when a maximal number of generations has been reached without satisfying this first condition. The duration is the number of generations necessary to reach the stopping criterion. 3.4.

Segmentation quality measures

When dealing with simple artificial images, the true segmentation of the image is perfectly known, and it is possible to quantitatively evaluate the segmentation results, using a numerical criterion rather than simple visual examination. We have defined two criteria to obtain such an objective evaluation of segmentation results. They, respectively, correspond to the two main properties which are expected for a good segmentation: homogeneity and specificity. The segmentation must be homogeneous: on each region of the image, the number of different labels must be minimized since a region should be uniformly labeled. The segmentation must also be specific: a label assigned to a region must not appear on another one. These two criteria are defined as follows. Let R be the true number of regions in the image and L be the number of possible distinct

labels. Let D be a R x L matrix, D(r, l) being the proportion of pixels belonging to the rth region which are assigned the label I in the segmented image. Let maj (r) be the label which is the most represented over the rth region. D(r, maj(r)) can be seen as a measure of the homogeneity of the labeling over the rth region. The homogeneity of the whole segmented image is taken to be the homogeneity of the less homogeneous region H=

min r~{ 1,.,.,R}

D(r,maj(r)).

For two different regions r and r', D(r', maj(r)) (resp. 1 - D(r', maj(r))) is a measure of the confusion (resp. distinction) which is made by the label maj(r) between these two regions. Overall, we take min 1 - D(r', maj (r)) r'~r

as a measure of the specificity of the labeling of region r. The specificity of the whole segmented image is defined as the specificity of the less specifically labeled region S=

min

{min{1-D(r',maj(r))}}

re{ I,...,R}

=1-

max r~{ 1,...,R}

(maxD(r',maj(r))}. r'~:r

Thus, if the segmentation is correct (only one label on each region, and different labels for different regions), then H = S = 1; if the same label is assigned to all regions, then H = 1 but S = 0; if the segmentation is correct except on one region where two labels are equally represented, then H = 0.5 and S = l, etc. It is clear that these parameters are not unique segmentation quality measures. In particular, we have chosen to use lower bounds, but mean values could have been chosen as well. The important point is that they provide a useful and significant index for comparisons between the results obtained on different runs with varying conditions. Furthermore, they are straightforward to compute. 3.5.

Parameters setting

Unless otherwise specified, the results presented in the following section have been obtained with the following settings: a feature vector on a point of the image is the m x m pixel matrix centered on this point. Since we are processing images with 256 gray levels, the feature vector a chromosome is coding for is a m E array of 8-bits coded integers. The neighborhood of a point is made of the eight contiguous points. Note that feature vector size and neighborhood size are two different, independent parameters. The maximal number of attempts to find a mate in one's neighborhood has been fixed to 1. Crossover is uniform,"7'1s~ that is to say, each bit has a fixed probability of being a recombination site, and there can be any number of recombination sites between mated chromosomes. The stability threshold is 99.5% and the number of consecutive stable time steps necessary to stop the process has been set to 5, to ensure that stopping is not due to an incidental high stability. The maximal number of generations has been set to 100.

Unsupervised image segmentation 4. RESULTS

4.1. Initialization and selection study This section illustrates the respective roles of initialization and selection in the competition process between the different label classes, and, therefore, in the segmentation results. To achieve this goal, experiments have been performed on a uniform 128 × 128 graylevel image (gray-level value: 100). In this case, since there is only one region, the expected result is a uniformly labeled image. Note also that the specificity is always equal to 1: therefore, evaluation will be based on the homogeneity criterion only. There are four possible combinations of initialization and selection modes (RI/LTS, RI/RWS, GI/LTS and GI/RWS), each of which will be referred to as a configuration in the following. For each configuration, four maximal numbers of labels have been tested (2, 4, 8 and 16), corresponding to four different label gene length values (1, 2, 3 and 4, respectively). This makes 16 different cases; in each case, 50 experiments have been performed. Other parameters are set as follows. The feature vector a chromosome is coding for is a pixel value, that is to say, we take m = 1. Since we want to study initialization and selection effects only, mating has been disabled: step 3 of the algorithm is skipped. Average homogeneity and average duration are given in Tables 2 and 3, respectively. Figure 1 shows typical segmentation results for each configuration, in the case where the maximal number of labels is equal to 4. The dynamics of the process is also illustrated in Fig. 2, which plots the evolution of homogeneity and stability during typical runs for each configuration (with a maximal number of labels equal to 8). These results illustrate the major influence of the initializationmechanism on segmentation quality: low quality results are obtained with RI mode, whereas perfectly correct results are obtained with GI mode. In RI mode, all label classes exhibit the same distribution over feature space; therefore, they are all equally fitted to their environment. On the contrary, GI mode offers an opportunity to a true competition between classes. Thus, having different initial label class distributions over feature space is a necessary condition for competition being effective. It is likely that distribution laws

Table 2. Initialization and selection study. Homogeneity of segmentation results for each configuration and various maximal numbers of labels (average over 50 runs), t Indicates that the system had not met the stability criterion when it reached the maximal time step Labels Configuration RI/LTS RI/RWS GI/LTS GI/RWS

2

4

8

16

0.557 0.532t 1.000 0.999

0.321 0.293t 1.000 0.999

0.202 0.173t 1.000 0.888t

0.137 0.106t 1.000 0.857t

663

Table 3. Initialization and selection study. Duration of segmentation process for each configuration and various maximal numbers of labels (average over 50 runs), t Indicates that the system had not met the stability criterion when it reached the maximal time step Labels Configuration RI/LTS RI/RWS GI/LTS GI/RWS

2 21.18 100.00t 6.00 31.38

4

8

16

23.04 22.88 23.80 100.00t 100.00~ 100.00t 7.00 8.56 9.88 37.28 100.00t 100.00t

other than Gaussian would work as well, provided that label class distributions are not overlapping. The influence of selection mainly concerns the convergence speed of the algorithm. LTS leads very quickly to the stopping step, whatever the initialization mode is. RWS is weaker: with RI, the system remains in an ever-changing state and never converges. With GI, the system converges but far more slowly than with LTS. This explains why the quality of the results with configuration GI/RWS are not optimal with 8 or 16 labels. In these cases, indeed, two classes of labels are almost equally well fitted (due to the initialization mode, their initial distributions are approximately symmetrical around 100), and selection will have to work a long time to make a winner. 4.2. Mating study In this section, we are simultaneously addressing two questions. First, does the application of genetic operators allow the system to find better quality results in the cases where it previously obtained poor ones (with initialization RI)? Second, what is the influence of these mechanisms on the behavior of the system in the cases where it previously led to good quality results (with initialization GI)? Let us first consider the crossover operator. Three crossover rates have been tested: 0.01, 0.05 andS). 1. The maximal number of labels have been fixed to 8. Homogeneity is reported in Table 4 and average final fitness is reported in Table 5. Qualitatively, the effect of an increasing crossover rate on duration is a decrease with configurations involving RI, and, on the contrary, an increase with configurations involving GI. The quality of the segmentation seems to be poorly affected by crossover. However, a slight quality decrease appears with the GI/RWS configuration. This corresponds to a decrease in the convergence speed of the algorithm. Thus it appears that, as far as homogeneity is concerned, crossover affects the dynamics of the system but not the final state. Thus, crossover is unable to increase the quality of the results in the cases where the system leads to low quality results without crossover. In the cases where the system previously led to good quality results (configurations involving GI), crossover

664

P. ANDREYand P. TARROUX

Table 4. Crossover study. Homogeneity of segmentation results for each configuration and various crossover rates (average over 50 runs). The maximal number of labels is 8. tlndicates that the system had not met the stability criterion when it reached the maximal time step

Table 5. Crossover study. Population average final fitness for each configuration and various crossover rates (average over 50 runs). The maximal number of labels is 8. t Indicates that the system had not met the stability criterion when it reached the maximal time step

Crossover rate Configuration RI/LTS RI/RWS GI/LTS GI/RWS

Crossover rate

0.0

0.01

0.05

0.1

0.202 0.173t 1.000 0.888t

0.196 0.167t 0.995 0.872t

0.196 0.172t 1.000 0.856t

0.193 0.170t 1.000 0.850t

Configuration RI/LTS RI/RWS GI/LTS GI/RWS

0.0

0.01

0.05

0.1

-0.003 -4.209t -6.602 -ll.210t

-0.004 -4.261 t -4.560 -9.074t

-0.004 -4.5361" -1.069 -6.669t

-0.002 -4.627t -0.427 -5.801t

Fig. 1. Initialization and selection study. Typical segmentation of a uniform gray-level image for each configuration. The maximal number of labels is 4. In each case, T is the stopping step. (a) RI/LTS, (b) RI/RWS, (c) GI/LTS, (d) GI/RWS. In (b), the system has reached the maximal number of generations before the stability criterion is met.

Unsupervised image segmentation

i

665

i

1

0.8 ,.f ..- ' .................

0.6

~)

...S ......

RI/LTS RI/RWS GI/LTS GI/RWS

.....

-..... ...... ...........

0.4

0.2

0 0

20

40

I

I

60

80

100

TIME STEP

/f-

100

i

80

.,....-" 60

,J ,....."

......

..~

.-.°.

~- .1~'-~'~-

........

~...-.~

.....

...~-~-~---

./"

< F(/3

40

./-,,-" *~

RI/LTS RI/RWS GI/LTS GI/RWS

20

0

0

-..... ...... ...........

I

I

I

I

20

40

60

80

100

TIME STEP

Fig. 2. Initialization and selection study. Evolution of homogeneity and stability during a typical run for each configuration. The maximal number of labels is 8. decreases convergence speed, but increases the population average final fitness (note that the value of this parameter is not affected at all in configurations involving RI). There are two hypotheses to explain both the decrease in convergence speed and the increase of average final fitness. The first hypothesis is that since processing time is Ionger, individuals have more time to adapt and increase their fitness. The second one, on the contrary, is that, since all label classes increase their adaptive value through recombination, the winning label class has stronger competitors; consequently, the time needed to eliminate them is longer. The results (Table 5) show that the first hypothesis must be rejected. For the GI/RWS configuration, the average fitness at the same time step (100) indeed increases with crossover rate. Therefore, the increase in convergence time is due to the increase in average fitness, which itself results from recombinations between chromosomes.

To study the impact of mutation on the behavior of the system, three mutation rates have been tested: 10- 5, 10- 3 and 10-1. Other parameters were: crossover rate of 0.05 and 8 label classes. With a high mutation rate (0.1), whatever the configuration, the system never reaches the stability threshold and final homogeneity is decreased to approximately the same value (0.133). At a lower rate (10-5), performances remain almost unchanged. At the intermediate rate (10-3), LTS is more robust (performances are not significantly affected) than RWS (performances decrease). Therefore, mutation is unable to increase the performances of the system. However, it should be kept in mind that this result was obtained for a specific case, for which the population size is quite large compared to the feature space size. In such a case, the role of mutation is limited because all values in feature space are represented. It is likely that mutation plays a more important role in

666

P. ANDREYand P. TA~OUX i

i O0

0

m=l m=3

0

~,

o

0

0.9 u. 0 W

0.8 [] Z

w 0.7

8 oT

0.6

0.5

I

i

0

5

I

I

I

10 15 NOISE STANDARD DEVIATION

20

Fig. 3. Effectofnoise on system's performance.Performance is expressed here as the product of homogeneity by specificity.Each point represents an average over 10 runs. larger feature space, where population size is no longer sufficient to cover all possible values. Nevertheless, the study reported above suggests that mutation should be used at a very low rate. 4.3. Noise effects This section illustrates the performances of the system on noisy images. Noise can affect both quality measures we have previously defined, namely, homogeneity and specificity. Experiments reported in this section were performed on a four squares 128 x 128 image. The image is divided in four quarters, each of which takes a uniform gray-level value (50, 100, 150 or 200). Gaussian noise (mean = 0.0, standard deviation = {0.0, 0.5, 2.0, 5.0, 10.0, 15.0, 20.0}) is added to this original image. Configuration is GI/LTS, with a maximal number of labels equal to 8. Crossover rate is 0.1 and mutation rate is 0.0. Two types of coded feature vectors have been tested, with m = 1 and 3, respectively. For each standard deviation value, and each m value, 10 runs were performed. Results are plotted on Fig. 3. Separate examinations of homogeneity and specificity values reveal that the evolution of the performance mainly reflects the evolution of homogeneity alone; specificity is almost unaffected by noise. These results illustrate the robustness of the method with regards to noise. The robustness is increased with the size of the coded matrix. 4.4. Natural scenes segmentation Some results obtained on natural images are shown in Figs 4-6. They were obtained in the following conditions. Configuration is GI/LTS. The maximal number of labels is 16. Crossover rate is 0.1 and mutation rate is 10 -5 . The feature vector chromosomes are coding for is a 3 x 3 pixels matrix. As can be seen, though the algorithm is simple, unsupervised and uses crude information, it behaves correctly. It should also be noted that the segmentation

is obtained in very few iterations. Note, however, that errors occur on image 6, where there are shadows and reflects due to the illumination. Figure 7 illustrates some results obtained on histological data, with the same parameter settings as above except that the maximal number of labels is 8. As can be seen, the quality of the segmentation can be improved when the feature vector size, the neighborhood size, or both, are increased. The differences between segmented images 7(b) and 7(c) do not correspond to an increase of feature vector size but to an increase in neighborhood size only. These differences can be explained by the fact that with a small neighborhood size, small clusters localized on micro-regions survive, since they do not suffer from the influence of surrounding labels. On the contrary, when the size of the neighborhood is larger, the range of interactions increases and small clusters are drowned in a harder selection. 5. DISCUSSION Though the method we have introduced in this paper is strongly related to segmentation methods based on clustering in feature space, it is worth noting that the realized clustering is not explicitly embodied in the algorithm. In fact, the problem is decomposed and solved by a distributed system of simple, locally interacting components (binary-coded rules). Thus, the system is defined at the individual level, while the functionality (i.e. image segmentation) occurs at the population level. This emergent functionality~19~is the central explanation of the system's flexibility,efficiency and robustness. Moreover, such a bottom-up device is inherently parallel. Though the program has been implemented on a sequential machine, this parallelism, along with the restriction of interactions in a small area around each individual, makes this approach suitable for an efficient implementation on a parallel machine, since parallelism is essentially attractive when communications between distant components are limited.

Unsupervised image segmentation

667

Fig. 4. An indoor scene segmentation. (a) Original image (284 x 370). (b) Segmented image (regions boundaries). The result was obtained in 23 iterations. The possibility of such a parallel implementation, in conjunction with the unsupervised functioning of the system and the few numbers of iterations it requires for processing an image, makes it an interesting candidate to be included in on-line automatic scene analysis device.

Booker 12°) has shown that the performances of a classifier system in an unsupervised categorization task are increased when rules can reproduce only with rules responding to the same category. Our algorithm implicitly implements a similar mechanism: since it is highly probable that two neighboring pixels belong

668

P. ANDREYand P. TARROUX

(o)

(b)

Fig. 5. An outdoor scene segmentation. (a) Original image (286 x 372). (b) Segmented image (regions boundaries). The result was obtained in 22 iterations. to the same region, restricting mate search to the neighborhood of each chromosome implies that, except at region boundaries, chromosomes preferentially mate with individuals corresponding to the same cluster.

Competition between label classes is of major importance in the behavior of the system. It is worth noting that competition is also the basic mechanism of connectionist techniques such as competitive learn-

Unsupervised image segmentation

669

Fig. 6. An indoor scene segmentation. (a) Original image (287 x 372). (b) Segmented image (regions boundaries). The result was obtained in 36 iterations. ingt2 t) and Kohonen's self-organizing feature maps. t227 These techniques can be considered as unsupervised clustering methods (they have other properties we shall not consider here). Recently, Kohonen's maps

have been used for their unsupervised clustering properties in image segmentation applications. {z3'24~ Compared to these competitive neural networks, our algorithm differs in the sense that competition occurs be-

670

P. ANDREYand P. TARROUX

(b) T = 68

o"

(c) T = 4 1

(d) T = 1 3

Fig. 7. A historical slice image segmentation. (a) Original image (128 x 128). (b) Segmented image, with m = 3 and 8 neighbors. (c) Segmented image, with m = 3 and 24 neighbors. (d) Segmented image, with m = 7 and 24 neighbors. In all cases, the configuration is GI/LTS and the maximal number of labels is 8. T is the stopping step.

tween subpopulations rather than between single units. Moreover, these techniques are fundamentally sequential and not suited for an efficient massively parallel processing of an image. Third, and more importantly, our algorithm, as we shall discuss below, takes into account spatial interactions between neighboring pixels in the segmentation process. For this reason, this method does not simply reduce to a clustering method. Suppose for example that the same feature vector appears more than once on the image, within distinct regions. Then, the same point in feature space can be assigned several labels, one for each region. This fundamental property is a conse-

quence of the representation of the assignment function as a set of(parallel) rules. Thus, the algorithm does not suffer from the major drawback of clustering-oriented image segmentation methods, which may produce oversegmented images, because they do not take into account the context in which pixels occur. To illustrate this point, some segmentation results obtained with thresholding and clustering in feature space are shown in Fig. 8. These techniques have been applied on image 7(a) in such conditions that comparisons can be made with the result shown in 7(d). The first technique consists in averaging original pixel values using a 7 x 7 smoothing filter and thresholding the resulting histo-

Unsupervised image segmentation

671

(b) Fig. 8. Segmentation ofimage7(a)using(a)averaging/thresholdingand(b)clustering techniques.Averaging was made over a 7 x 7 window. The resulting histogram was then thresholded at the two main valleys. Clustering was performed using a 8-unit one-dimensional Kohonen's self-organizingfeature map.

gram at the two main valleys. Clustering has been performed using a 8-unit one-dimensional Kohonen's self-organizing map, following the method proposed by Lin et al. t24~ for histogram partitioning (used for initializing labels in their relaxation method). The image is scanned in raster order. For each pixel, there is a feature vector x (a 7 x 7 pixels matrix); the unit 09 closest to x (according to the Euclidean distance) is determined. This unit is updated using the rule to,-to + ct(x - to). The two nearest neighbors (except at the end of the map, where there is only one neighbor) are updated with a similar rule, except that ~2 replaces cc Once the image has been scanned, ct is replaced by xct, where x is a predefined parameter. Parameter values are the same as in Lin et al. (0.9 for both ct and x). The algorithm is run until ~ is sufficiently small. The results obviously illustrate the fact that spatial interactions are efficiently processed in the algorithm we propose, which is not the case of a simple clustering. With averaging/thresholding, the problem arises at boundaries between epithelium and ducts lumen (they correspond to the extreme peaks of the histogram), because the average value corresponds to the central peak, i.e. conjunctive tissue. Therefore, thresholding leads to the concentric bands effect observed on Fig. 8(a). Note also the fact that there are gaps and narrows at several places in the resulting segmentation. The band effect is even more pronounced with clustering, as it can be seen in Fig. 8(b). This is a consequence of the unsupervised character of the method: all labels are produced, whereas the knowledge of the existence of only three categories has been used a priori in the above thresholding. The question that arises is why the band effect does not appear with our algorithm. The hypothesis we can formulate is that initially, there may PR 27:5-C

be individuals belonging to label classes that are adapted to the boundaries between regions (that is indeed the phenomenon observed at the very beginning of the process). They are subsequently eliminated by selection because they are surrounded by individuals living in broader clusters: supposing that all individuals are equally well adapted, the probability that a pixel label remains unchanged during selection depends on the number of neighbors belonging to the same class. For it takes into account local interactions between pixels to iteratively update and adjust the distribution of labels on the image, the algorithm is related to relaxation techniques. However, it requires neither an initial label assignment and a priori specifications of compatibility coefficients between labels, as for probabilistic relaxation,(25'261 nor a model describing the statistical dependence between pixels, as for stochastic relaxation.(27) Tightly correlated with these observations is the problem that, though there are intuitive as well as experimental evidences for convergence, the method does not easily lend itself to convergence analysis. In this paper, a crude version of feature vectors is used, since chromosomes are coding for pixel value configurations. Though encouraging results are obtained on real images, this representation is clearly not the most suited one for processing textured images. In particular, this representation is not shift- nor rotationinvariant: suppose for example that a given chromosome is very well fitted at a given location on the image. Then, a slight shift or rotation of the subimage recognized by this chromosome will in most cases decrease its fitness, although it is still on the same region. However, one important feature of the method we propose is that the generalization to any kind of representation

672

P. ANDgEVand P. TARgOUX

is quite easy. Indeed, this generalization only requires that the interface between chromosomes and fitness function, i.e. the decoding and evaluation procedures, be changed. Note that this property results from the fact that the basic mechanisms in genetic algorithms and classifier systems are domain independent. Future work will then be concerned with the study of more powerful representations, possibly model-based, for processing textured images. Independently of the representation problem, further work is also required on other parts of the system, particularly with regards to its architecture and the interactions between its components. We are interested in the problem of studying how other architectures of this kind of system, possibly multilayer architectures, and more complex interactions between the components influence the behavior and the performance of the system in the particular case of image segmentation. The study reported in this paper was concerned with a single layer architecture system performing low level segmentation. In general, further processing is needed to obtain a powerful feature extraction device. On Fig. 6, for instance, low-level structures are identified, whereas higher ones, such as the eyes or the mouth, are not. While low-level processing can be achieved using only syntactical rules, semantic constraints are required for high level processing. The approach we propose offers an opportunity to embody both the low and the high levels within the same architecture and under the same functioning mechanisms, eventually resulting in a complete scene interpretation device. We also consider to extend this approach to the development of other digital image methods. 6. CONCLUSION In this paper, a new methodological approach to digital image processing has been introduced and studied in the case of image segmentation. The proposed system is a spatially organized set of binary-coded production rules. It is a simplified and modified version of the standard classifier system. The labeling of the image is coded by the set of production rules. It is iteratively modified using a distributed genetic algorithm. Its architecture and its functioning confer the system important properties, including simplicity, flexibility, robustness and parallelism. The results have shown that its performance compares favorably with related techniques. It appears that it is worth continuing the exploration of this research area, possibly focusing on segmentation as well as considering other image processings. Such developments can for example deal with other coded features, more complex interactions between rules and possibly multilayered architectures. REFERENCES

1. K.-S. Fu and J. K. Y. Mui, A survey on image segmentation, Pattern Recognition 3, 3-16 (1981). 2. A. Rosenfeld and A. C. Kak, Digital Picture Processing,

Computer Science and Applied Mathematics, 2nd Edn. Academic Press, New York (1982). 3. R. M. Haralick and L.G. Shapiro, Survey: image segmentation techniques, Comput. Vision Graphics Image Process. 29, 100-132 (1985). 4. D. H. Ballard and C. M. Brown, Computer Vision. PrenticeHall, Englewoods Cliffs, New Jersey (1982). 5. J. H. Holland, Adaptation in Natural and Artifwial Systems. University of Michigan Press, Ann Arbor, Michigan (1975). 6. D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Addison Wesley, Reading, Massachusetts (1989). 7. B. Bhanu, S. Lee and J. Ming, Self-optimizing image segmentation system using a genetic algorithm, Proc. Fourth Int. Conf. on Genetic Aloorithms, R. K. Belew and L. B. Booker, eds, pp. 362-369. Morgan Kaufmann, San Mateo, California (1991). 8. G. Roth and M. D. Levine, A genetic algorithm for primitive extraction, Proc. Fourth Int. Conf. on Genetic Algorithms, R. K. Belew and L. B. Booker, eds, pp. 487-494. Morgan Kaufmann, San Mateo, California (1991). 9. J.H. Holland, Escaping brittleness: the possibilities of general-purposed learning algorithms applied to parallel rule-based systems, Machine Learning: an artificial Intelligence Approach, R. S. Michalski, J. G. CarboneU and T. M. Mitchell,eds, VoL 2, pp. 593-623. Morgan Kaufmann, San Marco, California (1986). 10. L. B. Booker, D. E. Goldberg, and J. H. Holland, Classifier systems and genetic algorithms, Artif. lntell. 40, 235-282 (1989). 11. R. Tanese, Distributed genetic algorithms. Proc. Third Int. Conf. on Genetic Algorithms, J.D. Schaffer, ed., pp. 434-439. Morgan Kaufmann, San Mateo, California (1989). 12. J.P. Cohoon, S.U. Hedge, W.N. Martin and D.S. Richards, Distributed genetic algorithms for the floorplan design problem. IEEE Trans. Computer-Aided Des. Integrated Circuits Syst. 10, 483-492 (1991). 13. H. Miihlenbein, Parallel genetic algorithms, population genetics and combinatorial optimization, Proc. Third Int. Conf. on Genetic Algorithms, J.D. Schaffer, ed., pp. 416-421. Morgan Kaufmann, San Mateo, California (1989). 14. B. Manderick and P. Spiessens, Fine-grained parallel genetic algorithms, Proc. Third Int. Conf. on. Genetic Algorithms, J.D. Schaffer, ed., pp. 428-433. Morgan Kaufmann, San Mateo, California (1989). 15. P. Spiessens and B. Manderick, A massively parallel genetic algorithm. Implementation and first analysis, Proc. Fourth Int. Conf. on Genetic Algorithms, R. K. Belew and L. B. Booker, eds, pp. 279-286. Morgan Kaufmann, San Mateo, California (1991). 16. D. E. Goldberg and K. Deb, A comparative analysis of selection schemes used in genetic algorithms, Foundations of Genetic Algorithms, G.J. Rawlins, ed., pp. 69-93. Morgan, Kaufmann, San Mateo, California (1990). 17. G. Syswerda, Uniform crossover in genetic algorithms, Proc. Third Int. Conf. on Genetic Algorithms, J. D. Schaffer, ed., pp. 2-9. Morgan Kaufmann, San Mateo, California (1989). 18. W.M. Spears and K.A. De Jong, On the virtues of parameterized uniform crossover, Proc. Fourth Int. Conf. on Genetic Algorithms, R. K. Belew and L.B. Booker, eds, pp. 230-236. Morgan Kaufmann, San Mateo, California (1991). 19. S. Forrest, Emergent computation: self-organizing, collective, and cooperative phenomena in natural and artificial computing networks, Emergent Computation, S. Forrest, ed., pp. 1-11. MIT Press/Bradford Books, Cambridge, Massachusetts (1990). 20. L. B. Booker, Improving the performance of genetic algorithms in classifiersystems, Proc. First Int. Conf. on Genetic Algorithms and their Applications, J. J. Grefenstette, ed.,

Unsupervised image segmentation

21. 22. 23.

24.

pp. 80-92. Lawrence Erlbaum Associates, Hillsdale, New Jersey (1985). D. E. Rumelhart and D. Zipser, Feature discovery by competitive learning, Cognitive Sci. 9, 75-112 (1985). T. Kohonen, Self-organization and Associative Memory, Vol. 8 of Springer Series in Information Sciences, 2nd Edn. Springer, Berlin (1988). E. Oja, Self-organizing maps and computer vision, Neural Networks for Perception, Vol. 1 : Human and Machine Perception, H. Wechsler, ed., pp. 368-385. Academic Press, New York (1992). W.-C. Lin, E. C.-K. Tsao and C.-T. Chen, Constraint satisfaction neural networks for image segmentation, Pattern Recognition 25, 679-693 (1992).

673

25. A. Rosenfeld, R. A. Hummel and S. W. Zucker, Scene labeling by relaxation operations. I EEE Trans. S yst. Man Cybern. 6, 420-433 (1976). 26. J. Y. Hsiao and A. A. Sawchuk, Unsupervised textured image segmentation using feature smoothing and probabilistic relaxation techniques, Comput. Vision Graphics Image Process. 48, 1-21 (1989). 27. S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images, IEEE Trans. Pattern Analysis Mach. Intell. 6, 721-741 (1984).

About the Author--PHILIPPE ANDREY is a Ph.D. student of the University Paris 7 at the Groupe de BioInformatique, Ecole Normale Sup6rieure, Paris. He completed the Maitrise de Biochimie et Physiologic (B.Sc.) at the University Paris 6 in 1989, the Agr6gation de Sciences Naturelles in 1990 and DEA de Biomath6matiques (M.Sc.) at the University Paris 7 in 1991. He is currently working on genetic algorithms applications to digital image processing.

About the Author--PHILIPPE TARROUXis Associate Professor at the Ecole Normale Sup6rieure in Paris. He has been working on the development of computer software applied to biology for many years. He recently focused his research on adaptive algorithms and genetic methods. He is in charge of the Bioinformatics group of the ENS Department of Biology whose members try to derive new adaptive methods for artificial systems from biology and to use these models to better understand biological adaptation.