0031 3203/89 $3.00 + .00 Pergamon Press plc Pattern Recognition Society
Pattern Recognition, Vol. 22, No. 6, pp. 719 732, 1989 Printed in Great Britain
IMAGE SEGMENTATION USING A DYNAMIC THRESHOLDING PYRAMID M. SPANN* and C. HORNE~" *Department of Electrical, Electronic and Systems Engineering, Coventry Polytechnic, Priory Street, Coventry CV1 5FB, U.K. tSignal Processing Laboratory, Swiss Federal Institute of Technology, Ecublens (DE), 1015 Lausanne, Switzerland (Received 19 July 1988: in revised form 27 October 1988; received for publication 28 November 1988)
Abstract--An image segmentation algorithm based on multi-resolution processing is presented. The algorithm is based on applying a local clustering at each level of a linked pyramid data structure allowing seed nodes to be defined. These seed nodes are the root nodes of regions at the base of the pyramid, appearing in the multi-resolution data structure at a level appropriate to the region size. By applying a merging process followed by a classification step, accurate segmentations are obtained for both natural and synthetic images without the need for a priori knowledge. Results show that the algorithm gives accurate segmentations even in low signal to noise ratios. Image segmentation Multi-resolutionprocessing Regionmerging Edgepreservingsmoothing
I. INTRODUCTION The problem of image segmentation can be defined as the splitting up of an image into a set of connected regions, each region having uniform property, be it luminance or a more complex textural measure. Research into this problem has been going on for many years resulting in the publication of a large number of papers. (For an up to date review see Haralick et al. °~) A particular class of segmentation algorithms, in which processing is carried out over a range of spatial scales, have proved very promising. Early examples include the edge detection scheme of Rosenfeld and Thurston ~21 and the scale space filtering scheme of Witkin.13) Perhaps the best known is the 'split and merge' algorithm ~*) which is based on splitting and merging square shaped regions until a homogeneity criterion is met. Such an approach can be regarded as a global to local refinement contrary to the algorithm described in this paper which combines operations over ever increasing regions (not necessarily square). Other algorithms using multi-resolution processing are the pyramid linking algorithm ~5"6) and the quadtree algorithm. 17~The former uses a linked pyramid data structure and iteratively resets father/son links in a pyramid after which it recomputes local averages based on the new links until the linkage structure remains unchanged between iterations. Good results have been reported, although, in the basic version of the algorithm, there is an a priori assumption about the number of regions present. 22:6-F
Dynamicthresholding
Pyramidalrepresentation
So called 'automatic rooting' or 'unforced linking' strategies for the algorithm have been investigated by Cibulski et al. Is~ and by Antonise(9~ but to the best of the authors knowledge a satisfactory 'non-adhoc' scheme, which works well over a range of natural images, has yet to be developed. The quadtree method 17~applies a statistical clustering algorithm to the data at low spatial resolution followed by a boundary refinement algorithm to successively higher spatial resolutions. A theoretical justification for the algorithm is given in the book by Wilson and Spann (~°~ where it is argued that only by assuming consistency in region properties over a range of spatial scales can the constraints imposed by the uncertainty principle, which places a limit on simultaneous localization both spatially and in class space, (11~where class space is the space of property values (for example grey level) on which the segmentation is based, be satisfied. Accurate region boundary localization is reported even in very noisy images. However, the algorithm uses a fixed height quadtree to represent the multi-resolution data. This places a lower limit in object size that can be detected. If an object is too small to be found at the lowest spatial resolution then it cannot be recovered in subsequent processing. This paper describes a multi-resolution algorithm that is more general than the ones described above in that it requires no a priori assumptions. It uses a more sophisticated local processing operation than the simple averaging operations that have been used previously. Such an approach, as will be seen, allows objects or regions to be 'spawned' at a spatial scale appropriate to their size. Good results are obtained 719
720
M. SPANNand C. HORNE
for a range of natural and synthetic images. The organization of this paper is as follows. Section 2 describes the local thresholding operation and the pyramid data structure built up using this operation. Section 3 describes the merging and classification algorithm itself and Section 4 shows results on both natural and synthetic test images. Finally Section 5 presents conclusions and a discussion.
local averaging operation is performed v(i, j, p)" data = 3
3
~. v(2i + r, 2j + s,p - 1).datah(i,j;r,s)/ r=0s=0 3
3
~ h(i,j; r, s)
(4)
r=0s=0
for p > 0 and 2. T H E D Y N A M I C T H R E S H O L D
LINKED PYRAMID
v(i,j, 0)' data = g(i,j)
(5)
2.1. N o t a t i o n In order to describe the data structure used in the segmentation algorithm, notation based on graph theory will be used. Formally, a graph G is a pair of sets (V,E) where V is a set of vertices (nodes) {vl, v2... v,} and E is a set of edges comprising vertex pairs specifying the connectivity of K A directed graph (digraph) is one in which the directions of the connections are important. Thus to indicate the fact that v~ is connected to vj the notation v~ --, vj will be used. This indicates that vi can address vj but not vice versa. Consider a graph G(V,E) in which each node is identified by a triplet (i,j, p) and in which: v(i,j,p) --* v(2i + r,2j + s,p - 1) p = 1...log2n - 1;
r, s = 0 . . . 3 ; i , j = O . . . n / 2 p -
1
(1)
where the indices are taken modulo n. Further there are no nodes v',v" in the graph for which: v(i,j,O) --* v'
i,j = 0 . . . n - 1
v" ~ v(O, 0, logzn)
where h(i,j;r,s) is a switching function indicating whether the child node of v(i,j, p), v(2i + r,2j + s,p - 1) should be included in the average (h = 1) or not (h = 0). The switching function is determined by performing a clustering on the 4 x 4 block of data comprising the sons of v(i,j,p), this clustering being based on dynamically thresholded local differences. To explain the method in more detail let x = (Xl, x2 .... , xt6 ) be the signal vector of data values of the 16 sons of some node v(i,j, p) whose data field is to be computed. Let S be an ordered set of index pairs S = {(i~,Jx),... ,(i,,j,)}, each pair corresponding to 2 adjacent points (8connectivity) in the 4 × 4 neighbourhood, with an ordering (&,Jk) -< (ik+ 1,Jk+l) corresponding to the ordering of adjacent differences: ]xik - xjkl < Ixik+, -- xj,+,[
k = 1... r
(6)
(2)
and where i k
(3)
t = [xi, - xj, I/2
where n is some integer constant which is a power of 2. The nodes can be thought of as arranged in layers of lattices, nodes in one lattice having the same p value. In this way, the values of i and j specify the spatial positions of the nodes within each lattice. The above equations then specify a linked pyramid structure tS~ where each node has 16 children nodes and 4 father nodes. In order to be of practical use, a node must contain information about the signal that it represents. This information is stored in the following set of fields: data; class; pop; sum; sumsq; father. The value stored in each field will be indicated in the following way: v(i,j,p).data, v(i,j,p)'class etc. The purpose of each field will become clear as the algorithm is described. The father field is a pointer to a node in the pyramid. All the other fields take on numeric values. (The data field is assumed to be scalar although it is trivial to extend the algorithm to vectorial fields.) 2.2. L o c a l dynamic threshold averaging In order to build a pyramid from an n x n image g(i,j) of values (g(i,j) can be the luminance value or a feature vector at position (i,j) in the image plane), a
(7)
which, as will be seen, allows a suitable trade-off to be made between intra-region smoothing and interregion separation. Define a label vector I = (11,12 . . . . I 1 6 ) such that I i is the label given to xi. Define also a mean vector m and population vector p such that mi and p; are the mean and population of the points in the neighborhood assigned to label li. The local clustering operation iterates over the r difference values continually updating the vectors I, m and p. Let I ~ql, m (q) and plq~ be the vectors after the qth difference value has been processed where m and p are initialized to vectors of zeroes and ones respectively and I is initialized as follows: II°~= i.
(8)
Let y!q} be defined as follows: y!~) = xi if pl q) = I
~q~ otherwise
(9)
where the superscript q is left off the label subscript I i for clarity. An updating of the labellings occurs on the q + 1th iteration if I"~)-yiq Y~])I< t. In this case points xiq and xjq are given the same label and the
Image segmentation vectors are updated as follows: re(q+ li i)
2.3. Pyramid construction
,,(~).(q)~ = ~,,(~).(q) ~Yiq Pli -F yj~ P l j ] / ~up.l~i ) ~ +
p(lq)) jq
(I0)
ptq+~)p~q) + p(~q). liq -i( jq
(11)
Finally the label vector is updated as follows: i i . + t ) = i!~)
iff
II q) = I (.~)J.
i = I . . . 16. (12)
On completion of the r tn iteration let k be the index with maximum population: p(k,) i> p~r)
j = 1... 16,
j :/: k.
(13)
The pyramid is built by visiting each node of the initial linked pyramid structure in postorder (bottom up) °2) thus ensuring that all the children computations are carried out before those of the parent. On visiting each node, the switching function h is computed for the 4 x 4 block comprising the sons of that node followed by the computation of the local average (equation 4). The father field of each child of the currently visited node is then set or reset. If the father field has yet to be set then it is simply assigned to the current node: v(2i + r, 2j + s,p - 1)-father:= v(i,j,p).
The switching function h is then defined as follows: h(i,j, r , s ) = 1 if Ib(r.s) = k = 0 otherwise
(14)
where b(r, s) is an (arbitrary) one to one mapping from position (r, s) in the 4 x 4 lattice to index b(r, s) in the signal vector representation. The method described above is thus a local clustering technique which partitions a 4 × 4 neighborhood into a number of connected regions based on differences in data values between adjacent nodes. Such an approach was found to be very robust around edges at the expense of less smoothing within regions. Figure 1 gives an example of the operation of the algorithm on a 1 - d ramp like signal where the window encloses the points between n = 0 and n = 3 inclusive. If [g(3) - g(2)[ is less than the maximum of [g(2)- g(1)[/2 and [g(1)- g(0)[/2 then the points at n = 2 and n = 3 are averaged, the average value being used in future difference computations. Clearly no more smoothing will occur and thus the ramp will be removed and replaced by a more step like edge. Thus the above operation has not just an edge preserving, but an edge enhancing capability. A further example of this is given in Fig. 2. Here a smooth symmetric ramp edge is generated which divides a 256 x 256 image into two regions with different mean values. As the pyramid is built, the edge becomes sharper as can be seen in Fig. 2a, To show this effect more clearly, profiles around the edges of the three lowest levels of the pyramid are shown in Figs 2b-d, Fig. 2b being the edge profile at the lowest pyramid level. As can be seen, an edge approximately ten pixels wide has been sharpened to one of less than two pixels by the third level.
g(n)
•
......
•
•
•
......
•
0
1 234
721
......
) n
Fig. l. A ramp edge.
(15)
Such an operation can be regarded as an addition of an extra edge into the graph: v(2i+r, 2j+s,p--1)~v(i,j,p).
(16)
If the father field of a child is already set, then it is reset only if the data field is nearer to the data field of the current node than the data field of its father. Following the assignment of the father field, it is necessary for subsequent processing to compute statistical properties of the signal in the underlying region connected to each node through the father pointers of their children. In the case of simple grey level segmentations, it is only necessary to compute the mean and standard deviation as, to a first approximation, inter-region visibility is a function of these parameters. Thus the sum, sumsq and pop fields of each node are computed indicating the total sum and total sum squared of the data fields and the number of base level nodes which are connected to node v(i,j,p) through father assignments. Clearly, these quantities can be recursively computed as the pyramid is built, for example v(i,j,p), sum =
~
v(2i + r, 2j + s,p - 1)" sum
(r ,s)~ f f i,j')
(17) where (r, s) ef(i,j)
iff v(2i + r,2j + s,p - 1) --* v(i,j,p).
(18) If the father field is re-set then the computed totals must be adjusted accordingly. 3. SEED SPAWNING AND CLASSIFICATION
On completion of the pyramid building procedure the father fields of most nodes will have been assigned. Those nodes whose fathers have not been assigned will be called seed nodes. It is these nodes that are the 'roots' of the regions at the base of the pyramid since they are sufficiently separated in property space from any of their nearest neighbours not to be included in the local averages. In general, however, there are more seed nodes than regions in the image. Consequently, regions have to be merged on the basis
722
M. SPANNand C. HORNE
of whether there is sufficient grey level contrast between them. Merging procedures are conveniently carried out by operating on the seed nodes corresponding to the regions prior to final classification. Further, it is necessary to consider two types of merging processes. First, a region may not be able to be represented by a single seed but rather by a group of spatially adjacent seeds at the same level: TM Further, a 'dumbell' shaped region as shown, for example, in Fig. 3 may result in spatially separated seeds at different levels. The second type of merging process to consider is the merging of a seed node with a non-seed node which is connected to some seed at a higher level. Both types of merging can easily be implemented on the pyramid structure as described below. However, it is first necessary to consider a criterion for merging.
and fl and Psat are pre-defined parameters. Such a function has been used previously by de Souza C14~in the detection of edges in noise and is simply an interregion signal to noise measure. The x/P term can be justified by noting that regions with a population p have a spatial dimension of approximately ~/p. Therefore they can undergo within region smoothing with a linear filter of spatial width ~ x/P thus reducing within region noise fluctuation with little effect on inter-region bias. Hence it is reasonable to expect that large regions (up to some 'saturation' size Psat) are more visible in background noise than smaller regions with the same signal mean and standard deviation. It should be said that many functions of this type are applicable, the choice being made on pragmatic grounds as, to the best of the authors' knowledge, there is no general psychophysical model predicting visibility of regions in noisy backgrounds.
3.1. A criterion of visibility A criterion of visibility of two regions linked to two nodes vl and v2 can, to a first approximation, be related to the mean m, standard deviation s and size p (the number of points) of the signal contained in each region. Thus a boolean function can be defined which determines whether two regions are not visually distinct and thus have to be merged. A choice which works well in practice is given by
b(ml,sl,pl,m2,s2,P2) = {(Im~ - m2l~/p)/~/(s2 + s 2) > fl} (19) where p = min(psat, Pl, Pz)
(20)
3.2. Description of the merging procedures In the case of the first type of merging process, if two seed nodes are to be 'merged' it is only necessary to replace both their data fields by some suitable average value for future classification. Hence a simple labelling scheme is adopted. Let N = {vl, v2... vk} be the set of seed nodes in order of decreasing pyramid level. Let I be a k element vector of labels whose :h component Is is the label given to node v~ and m, s and p are the vectors of mean values, standard deviations and population corresponding to the data fields of the seed nodes belonging to each label. The labelling scheme simply iterates over each of the k seeds in turn deciding whether it should give a
Fig. 2a. A blurred edge and its pyramid.
Image segmentation
723
200.0 190.0
-
~BO.O ~70,0 -
-
t60.0
150.0 140.0 ~30.0 ~20.0 ~lO.O iO0.O 364.0
200.0
, , , , ," , " 368.0 372.0 376.0 380.0 384.0 3~B.O 392.0 3gE.O 400.0 4D~.O
-
~go .0
t ' / ' - f
/
~80.0 170.0
~60.O 150.0
I
]40.0
130.0
/
120.0
/
/
~0.0
iO0.O
•
~72.0
~7~.0
~
,
IBO.O
IE~.O
~BB.O
~g2.0
,
196.0
,
200.0
2D~.O
20rio
El2.0
200.0 190.0 BO. 0 t70.0
~60.0 ~SO.O
t30.0 ],~0.0
i
12o.0 ~I0.0 ~00.0
76.0
BO.O
,
~
B4.O
BB.O
'J
g~.O
T
,
,
....i -
,
gB.O 100.0 ~04.0 108.0 !t2.0
Fig, 2b, Edge profilesof the bottom three pyramid levels.
,
I'6.0
724
M. SPANNand C. HORNE Further, if there is no node v' for which v(i, j, p) ~ v' (i.e. it is a seed) then:
v(i, j, p)" class = v(i, j, p). data. Fig. 3. A 'dumbell' shaped region.
seed node an existing label or create a new label. The vectors m, s and p are all initialized to the null vector. I is initialized to the vector whose ith component t equals i. Let mq, sq~ and p~t be the mean, standard deviation and population of the underlying region of seed node v~. Let j be the smallest integer less than or equal to q such that b(m'~, s'q, p'~, mlj, s1,, Pl) is false. The seed node vq should be given labe~ Ij. and the vectors updated as follows:
mid+l) lj
t-C~)mtq) ~ ~,FIj lj +
s(q+ lj
--
I) 2 _ _
p'qm~)/(p~q) + p;)
(21)
/'n(q)(o(q) 2 ÷ /-n (q)2~ t,F l j [oIj lj J
t r2
12
(q)
+ pq(sq + mq ))/(ptj + pq) -
~ q + 1)~ "~I./
(22)
pCq l j + 1) ~ -tq) F I j + p~
(23)
I~q+~) = j.
(24)
Final labelling is given by: vi" data = m(~)
i = 1... k.
(25)
In the case of a seed node merging with a non-seed node that is connected to a seed at a higher level, the procedure is trivial. Given a non-seed node v(i,j, p), each child node v(2i + r, 2j + s,p - 1) is considered. The nodes are merged if the regions connected to each node are not visible. In this case the father pointer of a merged seed node is set to v(i,j, p). Thus if v(2i + r',2j + s',p - 1) is a seed node at level p - 1 then:
v(2i+r',2j+s',p-
1)--+v(i,j,p)
(26)
iff b ( m l , s l , p a , m 2 , s 2 , P 2 ) is false where m l , s a , p l and mz,s2,P2 are the signal means, standard deviations and populations in the regions connected to v(i, j, p) and v(2i + r',2j + s',p - 1) respectively.
(28)
A boundary refinement step was added which updates the classification of the boundary nodes. A boundary node is defined as one in which at least one of its 8-neighbours has a different value stored in its class field (except for seed nodes which are not re-classified). The refinement method updates the boundary nodes' classification at each level p using the classified data at level p - 1. Thus let B be the set of boundary nodes at level p, where it is assumed that boundary refinement has been carried out at level p + l, the classification of those nodes being assigned different class values being projected down. For each boundary node in B, each of its 4 fathers at level p + 1 are examined and it is assigned the class of that father node whose underlying region mean is closest to the value stored in the data field of that node. The new class assignments are then projected down to level p + 1 in the usual way and the process repeated. In general, this technique is very similar to the one used by Spann and Wilson in their quadtree based method, tT) except in the latter case, reclassification follows a local smoothing step. In effect, the smoothing step is an assumption of boundary smoothness, which, although probably less accurate in terms of classification error than the technique described above leads to more 'perceptually pleasing' results. 4. RESULTS This section presents some results of the algorithm. The first section demonstrates how seeds are spawned at various levels appropriate to region size. In view of the ubiquity of the split and merge algorithm, t4) Section 4.2 presents a comparison of the two approaches. In particular, the authors' experience in implementing the split and merge technique is described and compared to the new method. Finally Section 4.3 shows some results on natural images. In all cases, the images used comprised 256 x 256 pixels with each pixel having 256 possible grey levels. Further, the values of the constants fl and Psat required by the visibility criterion are 12 and 1000 respectively in all cases although their precise values were not at all critical. 4.1. Illustration of seed spawning
3.3. Final classification Final classification involves projecting the data field from each seed node down to each node that has, through the connection of the father pointers, that seed node as an ancestor. The class field of each node is used to store the class label of that node:
v(2i + r,2j + s,p - 1)-class = v(i,j,p)'class iff v(2i + r, 2j + s, p - 1) -+ v(i, j, p).
(27)
Figure 4a illustrates a simple test image consisting of circular discs in a constant luminance background. To this image has been added white Gaussian noise of variance tr2. The inter-region signal to noise ratio a defined as: a = Imobjc~t- mbackg.... dl/tr
(29)
where m is the mean signal luminance, is about two in this case. Figure 4b shows the 7 highest pyramid
Image segmentation levels (the root level (1 x 1) down to the 64 x 64 level). It should be noted that this image has been 'zoomed' by a factor of 4 in each dimension for clarity. Seed spawning occurs at the 8 × 8 level, the 32 x 32 level and the 64 × 64 level. In particular, at the 32 x 32 level the seed spawned is just a single node whereas it is a group of spatially adjacent modes in the other cases. The particular noticeable feature of this image is the sharpness of the inter-region boundaries allowing seeds to be clearly separated
725
even when they are just single isolated nodes. Figure 4c shows the final classification in which all of the objects have been successfully separated from the background and the boundaries accurately located, Fig. 4d highlighting inter-region boundaries. Note, however, because of the stochastic nature of the signals, the boundaries found are not smooth since there is no boundary smoothness assumption built into the algorithm (unlike, for example, the work described in Ref.~7~). This, if required by a particular
Fig. 4a.
Fig.4b.
726
M. SPANNand C. HORNE
Fig. 4c.
Fig. 4d. Fig. 4. Illustration of the seed spawning process. application, could easily be remedied by incorporating a simple local smoothing step in the boundary refinement as in Ref. (7). 4.2. Comparison with the split and merge algorithm During the course of their studies into the image segmentation problem, the authors have had some experience in implementing the well known split and merge algorithm t4~ and it is perhaps instructive to
give some general comparisons. The split and merge technique initially splits an image into a set of square regions depending on some split criterion followed by a merging of adjacent regions based on a merge criterion. For segmentations based on grey level, the splitting would typically be done on the basis of the global variance of the signal within the block to be split, for example whether the variance exceeded some threshold value indicating inhomogeneity within the
Image segmentation block. The merging would be based on some measure of inter-region contrast similar to the visibility criterion used above. In general, it was found that, for a good selection of split and merge threshold parameters, the method worked well. However, it was also found to be very sensitive to the actual values of these parameters and in particular one set of parameters was not appropriate for different images even of the same general type. A further problem is that the split step is based on global statistical measures. Consider, for example the image shown in Fig. 5a consisting of discs in white Gaussian noise with the signal variance inside the discs being very low compared to the background signal variance. Figure 5b shows the result of the split step and Fig. 5c the result of the merge step where, in both cases, block boundaries are highlighted. In this case, the variance threshold for splitting is not low enough to accurately resolve the region boundaries whereas there is considerable false splitting of the homogenous region. The merging process has more or less found the two objects but the definition of the region boundaries is very poor and has the characteristic 'blocky' appearance of the split and merge algorithm due to its processing of square shaped regions (the small regions of a few pixels can be remvoed by a post processing step). Also, it must be said that to obtain a result of even this quality required several 'trial and error' runs to obtain the right parameters. This result should be compared to that of the new algorithm shown in Figs 4d-e which is almost a perfect segmentation of the image using the same fl and Psat parameters as in all the other cases. 4.3. Results on natural images Figures 6a-c show segmentation results on an aerial image. Figure 6a is the original and Fig. 6b shows the segmentation with each region labelled with its mean grey level. Figure 6c highlights interregion boundaries. In order to produce these results an additional strategy was added to improve the algorithm's effectiveness in extracting long thin structures. The reason is that such structures result in seed nodes at low pyramid levels, the actual level being determined by the local width of the region. These seed nodes will have small populations and thus are almost certain to get merged into larger regions. In order to ameliorate this effect it is necessary to make these seed nodes 'aware' that they form part of a more global structure. The strategy adopted was that connected seed nodes at each level, assuming that they are each deemed not to be visually separable, are given the total population of the nodes in the connected region. The result is still not perfect as the model defining inter-region visibility (equation 19), assumes regions with approximately unity aspect ratio due to the x/P term. A more sophisticated test is needed that also takes into account region structural
727
properties as well as the statistical properties of the signal within the regions. Further, in some areas, a fairly noisy labelling is apparent as the signal, within these areas, is not sufficiently homogenous in grey level--textural measures are required. Figures 7a-b show a segmentation of a 'cars' image where again the population correction strategy has been applied. A satisfactory result is obtained with most of the thin structures being segmented satisfactorily. 5. CONCLUSIONS A new method of image segmentation based on multi-resolution processing has been presented. The advantages of the method over existing techniques are as follows: (I) The method makes no assumptions about region structure or the statistics of the signals within each region. (2) Due to its multi-resolution structure, the method works well even at low signal to noise ratios. (3) The pre-defined parameter settings are appropriate for all classes of images and are not critical. (4) No global statistical processing of the data is performed. The method only ever uses local operations at each level in the pyramid. (5) The method is not iterative. Also it is computationally efficient taking less than 10 min CPU time on a VAX mini-computer. (6) The method would be suitable for VLSI implementation, making real-time segmentation feasible, because only local processing is used and similar computations are performed at each node. It is anticipated that the algorithm will be applicable to the segmentation of images on the basis of attributes other than grey level. The most obvious is texture segmentation where the visibility criterion could readily be extended to the multi-dimensional case. A further possibility would be the segmentation of images that are not adequately modelled by a piecewise flat approximation, for example when there are gradients in the image caused by shading effects, projection effects and so on. In these cases, segmentation on the basis of piecewise low order polynomials is more appropriate315~ Here the visibility criterion could be based, for example, on the quality of fit of the approximating polynomial. It is intended to report on progress on these topics in the future.
SUMMARY
A common problem in image processing is the segmentation of images into more or less homogeneous regions. In general, this problem is best tackled by processing the data over a range of spatial scales. Several algorithms have been published illustrating different approaches within this general framework. However, to date, most algorithms of this
728
M. SPANN and C. HORNE
Fig. 5a.
Fig, 5b.
Fig. 5c,
Image segmentation
729
Fig. 5d.
Fig, 5e. Fig. 5. Segmentation using the split and merge algorithm.
type require a priori assumptions about structural properties of the signal, for example, the number of regions or their minimum size. A new algorithm is presented which is based on a dynamic thresholding pyramid. In this data structure, nodes are selectively included in local averages depending on some dynamically computed threshold. This technique allows inter-region boundaries to be
preserved and even enhanced as the pyramid is built whilst at the same time smoothing the signal within regions. Further, it allows 'seed' nodes tobe defined which comprise small groups of spatially adjacent nodes at some pyramidal level. Seeds indicate the generation of new classes within the signal at a level appropriate to the size of the underlying region. By applying a multi-resolution merging procedure on the
730
M. SPANNand C. HORNE
b.
C.
Fig. 6. Segmentation of an aerial image.
sub-pyramids corresponding to each 'seed' node in the pyramid and then projecting down the signal value from the seed node, a final segmentation of the signal is produced.
It was found that this scheme works well on a range of images, even at low inter-region signal-tonoise ratios, irrespective of region size or the number of regions present.
Image segmentation
731
b.
Fig. 7. Segmentation of a 'cars' image.
Acknowledgements--This work was funded by Thomson CSF and the Swiss National Foundation for Scientific Research. REFERENCES
1. R. M. Haralick and L. G. Shapiro, Image segmentation techniques, Comput. Vision Graphics Image Process. 29, 100-132 (1985). 2. A. Rosenfeld and M. Thurston, Edge and curve detection for visual scene analysis, IEEE Trans. Comput. C-20,
562-569 (1971). 3. A. P. Witkin, Scale-space filtering, Proc. IEEE ICASSP84, San Diego (1984). 4. P. C. Chen and T. Pavlidis, Segmentation as an estimation problem, Image Modelling, A. Rosenfeld, Ed., pp. 9-28. Academic Press, New York (1981). 5. P. J. Burr, T. H. Hong and A. Rosenfeld, Segmentation and estimation of region properties through co-operative hierarchical computation, I EEE Trans. S yst. Man Cybernetics SMC-11, 802-809 (1981). 6. T. H. Hong and A. Rosenfeld, Compact region extraction
732
7. 8. 9. 10.
M. SPANN and C. HORNE using weighted pixel linking in a pyramid, IEEE Trans. Pattern Analysis Mach. Intell. PAMI-6, 222-229 (1984). M. Spann and R. Wilson, A quadtree approach to image segmentation that combines statistical and spatial information, Pattern Recognition 18, 257-269 (1985). J. M. Cibulski and C. R. Dyer, An analysis of node linking in overlapped pyramids, IEEE Trans. Syst. Man Cybernetics SMC-14, 424-436 (1984). H.J. Antonise, Image segmentation in pyramids, Comput. Graphics Image Process. 19, 367-383 (1982). R. Wilson and M. Spann, Image Segmentation and Uncertainty, J. Kittler, Ed. Research Studies Press Ltd (1988).
11. R. Wilson and G. H. Granlund, The uncertainty principle in image processing, IEEE Trans. Pattern Analysis Mach. lntell. PAMI-6, 758-767 (1986). 12. J. J. van Amstel and G. J. Schoenmakers, Inleiding tot het Programmeren. Academic Service, Den Haag (1979). 13. W. I. Grosky and R. Jain, A pyramid-based approach to segmentation applied to region matching, IEEE Trans. Pattern Analysis Mach. lntell. PAMI-$, 639-650 (1986). 14. P. de Souza, Edge detection using sliding statistical tests, Comput. Vision Graphics Image Process. 23, 1-14 (1983). 15. P. J. Besl and R. C. Jain, Segmentation through variable order surface fitting, IEEE Trans. Pattern Analysis Mach. Intell. PAMI-10, 167-192 (1988).
About the Author--MICHAEL SPANN received his B.Sc. degree in physics in 1979 and his M.Sc. degree in computer science in 1981 both from Manchester University, U.K. He received his Ph.D. from Aston University in 1985 where he worked on image segmentation and texture analysis problems. He then joined the department of Electrical, Electronic and Systems Engineering at Lancaster Polytechnic, Coventry and is currently a senior lecturer in that department. From 1987 to 1988 he spent a year as a research fellow at the Signal Processing Laboratory at EPFL, Lausanne, Switzerland where he worked on various image analysis problems. Dr Spann is a joint winner of the Pattern Recognition Society outstanding paper award in 1985. About the Author--CASPAR HORNE received his Bachelors degree in 1984 in electrical engineering and his
Masters degree in signal processing in 1986 both from Delft University, Holland. Since 1986 he has been a research student at EPFL Lausanne where he is working towards a Ph.D. degree in Image Processing.