FUZZY
sets and systems ELSEVIER
Fuzzy Sets and Systems 68 (1994) 293 308
Integrated fuzzy clustering Vito Di Ges6 Dipartimento di Matematica ed Applieazioni, University o[" Palermo Via Archirafi 34, 1-90123, Palermo, Italy Received October 1992; revised January 1994
Abstract
Cluster analysis is a valuable tool for exploratory pattern analysis, especially when the information available is not complete, and/or the data model is affected by uncertainty, and imprecision. Here a new integrated clustering method, based on a fuzzy approach is proposed. Its implementation consists of the combination of hierarchical fuzzy algorithms. The performance and accuracy of the methodology are tested on biomedical images. An application to the segmentation of magnetic resonance images is discussed.
Keywords. Fuzzy sets; Cluster analysis; Data
analysis; Pattern recognition; Integrated clustering
1. Introduction
A partition P = {Po, PI,...,PK-1} of a data set X = {Xo,Xl ..... xN 1}, with 2 ~< K ~< N is named a hard clustering of X. The partition is based on a given predicate, such that: V i ¢ j Pi c~ P~ = ~b and U ~---oI Pi = X. At the end of a clustering procedure, a label 2~ e {0, 1..... K - 1} is assigned to each element, x, of X. Each possible partition of X can be represented by a K x M matrix, and it corresponds to an element of a hard K-partition, M, defined as follows:
M= { M~MKNI)~''k6{O'I}'~Zi'R=k
1 , 0 < ~ Zf oi ,rk0<
K and O~< i <~N},
where MrN is the set of real K x N matrices, and each element )~i,ke {0, 1} represents the characteristic function of the element x~ in the "cluster" Pk. In the following a partition of X will be denoted either by P or by M. Cluster analysis is a valuable tool for exploratory pattern analysis, especially when very little a priori information is available about data. Such condition is usually determined by uncertain and imprecise data models. Statistical approach is very useful and well established whenever the uncertainty is of "random" nature; it becomes unsatisfactory if the uncertainty is of different nature (linguistic, ignorance) or the data model is very complex. 0165-0114/94/$07.00 © 1994 Elsevier Science B.V. All rights reserved SSDI 0 1 6 5 - 0 1 1 4 ( 9 4 ) 0 0 1 1 1-J
294
V.D. Gesh / Fuzzy Sets and Systems 68 (1994) 293-308
Often, the elements of X share properties of more than one cluster; in such cases the hard partition of the data is not satisfactory. For example, at the beginning, the astronomers classified galaxies, on the basis of their morphology, into two main classes: elliptic (E) and spiral (S). However this partition was unsatisfactory; therefore, astronomers have refined their classification in order to consider several degree of eccentricity (E0, El,..., E7), and other combinations of shapes (Sa, Sb, Sc, SBa, SBb, and SBc). This refinement could be iterated, but the resulting classification will be still not exhaustive. In such situations, both the "heuristic" and the "user expertise" allow to find suitable approximate solutions; therefore, an automatic data-analysis system should include a suitable representation of both data model, and expert knowledge. Fuzzy-sets theory [21] has demonstrated its ability to represent both of them inside an automatic system. The application of such theory could provide a suitable tool to develop fuzzy-clustering algorithms. The formulation of a fuzzy-clustering problem has the same form of the classical one; however, in this case, the intersection and union operators have different interpretation. In particular, the above definition can be rewritten by assuming Xl,k E 1-0, 1] as the membership function of the element xi to the "fuzzy-cluster" Pk. Fuzzy-clustering approaches can be considered to overcome some of the problems, mentioned above. Fuzzy-clustering algorithms are, usually, objective functional; i.e. the clustering criteria is based on the maximization (minimization) of a gain (lost) function, which is computed, for example, on the basis of a possibility model [20] of the data provided, for example, by an expert. Fuzzy-clustering has been studied by several authors [1, 2, 9, 16]. For example, the fuzzy-c-mean (FCM) algorithm has given fairly good results when applied to the analysis of multispectral thematic maps [-3]; recently, FCM has been also applied to the analysis of Magnetic Resonance Images [4]. However, it must be pointed out that fuzzy-clustering requires, usually, a large amount of CPU time (to search membership function values) and memory space (to memorize the fuzzy-partitions membership functions). The need exists for developing approximate version of fuzzy-clustering, to overcome the time/space problems, without effecting dramatically the accuracy of the results. Hierarchical Fuzzy Clustering (HFC) algorithms have been designed [7] in order to achieve both time/space reduction and good accuracy. In [-7] the authors show that time/space reduction can be obtained by using the locality of HFC that allows to store a temporary membership matrix in each node of the decision tree, size of which is equal to the number of suns of the node considered, which is much smaller than the total number of grouping. Integrated clustering (IC) algorithms have been utilized, with promising results, to perform Magnetic Resonance Imaging for medical diagnosis [8]. In the referred work the final labelling was computed by using a non-weighted vote function; moreover, hard-clustering algorithms were considered for the integration phase. In this paper a new Integrated Fuzzy Clusterin9 (IFC) technique is described; it consists in the weighted combination of more than one clustering methods C11, C12.... , Cls, in order to reach the final grouping of data. The weighted-vote function allows to model the accuracy related to each clustering algorithm; moreover, three new HFC-algorithms are introduced to perform the IFC: Two-Phase Clustering (TPC), Hierarchical Fuzzy Partition (HFP), and Hierarchical Fuzzy Square Error (HFSE). The main motivation for considering this combined technique is the fact that the human decision model also uses more than one evaluation paradigm, and usually a complex decision is taken by more than one expert. Each Cls (1 ~< s ~< S) of an IFC algorithm has a weighting coefficient, that is defined in the interval [-0, 1]; it is related to the accuracy of the method, and is used to perform the final grouping of the data. It must be pointed out that the IFC-clustering technique has some connections with data-fusion techniques [ 13, 15]. However, in the last case the integration regards the information deriving from different measurements, collected, for example, by different detectors. The results of the application of the IFC on medical images have been compared with classical methodologies implemented inside the ISAW image analysis system 1'-143 and the Approximate Fuzzy C-Means (AFCM) algorithm 1'-43. IFC has demonstrated a better global accuracy as reported in Section 4.
V.D. Gesh Fuzz), Sets and Systems 68 (1994) 293 308
295
In Section 2 the IFC technique is described. In Section 3 three hierarchical fuzzy clustering algorithms and their integration in an IFC-algorithm is described. Section 4 reports experimental results of the IFC on biomedical images, and implementation notes on the HERMIA-machine (Heterogeneous and Reconfigurable Machine for Image Analysis) [12]. Final remarks are given in Section 5.
2. Integrated fuzzy clustering
Let J¢ = {M tt~, M (2) . . . . . M Is~..... M (s)] be the partition matrices computed by the clustering algorithms Clt, C12..... C1...... Cls; moreover, suppose that an initial possibility distribution, H~, defined on J / , exists: H.a _- (m II,/ztl, m(2117r 2 . . . . . MIS~/~z. . . . . . MIS)/~Zs) . Each possibility value range in the interval [0, 1], and Yi ~i could be greater than 1. The meaning of H ~ depends on the experimental situation. For example, it could be related to the accuracy of each method
(CI), or it could be determined by the user, on the basis of his experience. An integrated fuzzy clustering consists in the evaluation of the final grouping, M, and its related possibility value, rt, by means of functions f#(J¢';/Ta, ), and ~(//.~) respectively. Experts may interact and interchange information in several fashions. Here two main strategies are considered: the global combination (IFC-global) and the chain combination (IFC-chain). In the first case, (see Fig. l(a)) the clustering algorithms perform their computation independently; for example, the values M and zr could be evaluated by using the following linear weighting functions: M -
S~=~~s xm~sl ~s
1 T[s
7r=
1 s 2., 7z~.
Ss=l
The parameter zr is usually considered as a global measure of the performance of the IFC-global procedure.
/•M '\
Ill
M,~
•
,<,./I
i
(a) M [11
M (2)
IT"1
~Z
- - M
(s)---M
~g---rt
(b) Fig. 1. IFC-algorithms:(a) global combination;(b) chain combination.
V.D. Gesit / Fuzzy Sets and Systems 68 (1994) 293-308
296
IFC-global approach should be preferred, whenever the clustering algorithms do not need to exchange information during the computation. The evaluation of H ~ is the crucial point and is related to the relevance of each clustering algorithm. In the case of chain combination (see Fig. l(b)), the partition, M (s},and the possibility value rts, at the stage sth depend on both the algorithm Cls and the partition obtained at the previous stage (s-1)th: M °) = ff(M (~), M (~-1), Zts, gs-1)
for 1 < s ~< S,
~s = ~ ( ~ , , ~ - 1 )
for l < s ~ < S ,
M = M is}, 7E:
7Cs .
Note that in the IFC-chain procedure the updating of zt, is performed after the computation of M ~'). Examples of functions ff and ~ are M (') = (M (') x zt, + M('-I) x ~,-1)/( 7t, + ~,-1) ns = max{0rs + 7r,-l)/2, ns-j}
(for 1 < s ~< S).
It must be pointed out that the result, obtained from the application of an IFC-chain algorithm, may depend on the order in which the Cl's are executed. Chain combination must be preferred whenever the clustering algorithms interact in order to find, step by step, the clustering of the data. For example, if Cls is an unsupervised non-parametric single-link algorithm, and CI~+ 1 is an ISODATA procedure, then CI~ could provide an indication about the number of clusters and their barycenters to the algorithm Cls+ 1, which needs such information to work properly. A validation function is, usually, inserted in IFC-algorithms to test the consistency of each cluster with respect to some indicator (for example the size, the internal variance). Note that hybrid combination strategies can be designed; for example, the algorithms could be organized in a connected graph, in which parallel and chain combination are represented as subgraphs. Moreover, backtracking could be considered in order to adapt the strategy to partial results as in the case of "consultant systems" 1-18]. Integrated Fuzzy Clustering approach is computational expensive, because it requires the execution of S clustering algorithms. However the advent of parallel machines, based on VLSI technology, provides new tools in order to design faster parallel clustering algorithms. Both IFC-algorithms are suitable for parallelization on M I M D (Multi-Instructions Multi-Data) machines. IFC-global can be implemented on a multiprocessors machine with S processing units (PUs), on which the Cll,Clz ..... Cls clustering algorithms are executed asynchronously. Moreover, each PU can be a multiprocessor machine with an appropriate network topology, which depends on the algorithm to be executed. The integration phase is then performed as soon as all partial results are available. IFC-chain can be implemented by using a pipeline architecture, with S PU's. In this case the sth stage of the pipe executes the algorithm C1, and receives the pairs (M (~- 1),Tzs_1) from the previous stage. The last element of the pipe contains the final grouping. In [6] parallel algorithms for image segmentation are reviewed, computational performance on M I M D multiprocessors machines is discussed for both global and chain integration techniques.
3. An example of lFC-aigorithm In the following an example of IFC-algorithm, based on the global combination of three hierarchical fuzzy clustering algorithms, is deeply described.
V.D. Gesh / FuzO, Sets and Systems 68 (1994) 293-308
297
Here X is considered a vector of size N, each element, xi, is defined in a (d + 2 ) - D space: xi -- (pl, qi, J/i, 1, th, 2..... Ih.d), for 0 ~< i < N. The first two coordinates represent the spatial position of xi in the subspace X(p.q); they allow to consider the shape morphology in the case of image data. While the last d's represent measurements, they are also named features or spectra of X. The labeling produced by a clustering procedure is stored in a vector 9t={2o, 2~..... 2N-l} with 2 i = 0 , 1 ..... K - l ; the vector N = {~o,~ ..... ~s l} stores the belonging degree of xl to 2i. Moreover, a consistent clustering must satisfy the property (~ = 7~,~, = max{)~.kl0 ~ k ~< K}. A top-down hierarchical clustering algorithms perform the iterative split of a subset, P, of the space, X, into descendant subsets: P b Pz . . . . . PL, with L/> 2, starting with P ~ X. The split procedure is repeated on each subset Pi (i = 1,2 .... , L), until the required number of clusters, K, is found. Hierarchical clustering may be described by means of a rooted tree, where the root represents X, and the leaves are the elements of the partition. Three elements must be considered in the construction of the tree: (a) the choice of the split rule; (b) the consistency test in order to stop the splitting of a node; (c) the assignment of each terminal node to a cluster. Another approach starts with K -- N clusters, one for each element of X. In this case the tree is built by merging similar clusters. Given two clusters P and Q, the merging procedure can be defined as follows: P, = merge (P, O) <* zv(Q)>4)
with0~
where P and Q are clusters, and ZP(Q) is a similarity function defined on the set of the clusters that satisfies the following properties: (a) 7~P(Q) ~ [0, 1], (b) zP(Q) = zo(P) . The function zP(Q) can be interpreted as the belonging degree of the cluster Q to P. An example of similarity function is [6] zP(Q) = 1 - O(P, Q),
where o ( P , Q) = ~oE(P, 9.) + ~ o , ( P , 9.)
o <~ ~, l~ <~ 1, ~ + ~ = I.
The coefficients ~ and fl have been introduced to normalize O and to weight the relevance of the spatial and spectral features. The term OE(P, Q) is the normalized Euclidean distance Oe(P, Q ) = min{ x/~p:' - pr)2 (q~Dx + - q Y ) Z V x e P and Vy eQ}.
Here D x = m a x { x ~ x - p y ) 2 + ( q x - q y ) Z V x e P and Vy eQ} is a normalization factor. The term Or(P, Q) is a dissimilarity function in the feature space:
(d ,==, max{. .... tl,,,,}
Vx e P and Vy eQ}.
In the case of a positive feature space, it can be shown that Or(P, Q) is also a distance function. The values of the function @(P, Q) are ranged in the interval [0, 1]; therefore, )~e(Q) is a similarity function.
V.D. Gesh / Fuzzy Sets and Systems 68 (1994) 293-308
298
Merging of clusters in the spectral measurements is useful, whenever the spatial relation is not relevant. If a normal probability function is assumed to model the data in the spectral feature space, the following intercluster closeness function can be used: d
1
p(#(P), a(P), I~(Q), a(Q)) = i=1 x/ai(P) x ai(Q)
x ( l a , ( P ) - a,(Q)l + I ~ , ( P ) - ~,(Q)I).
The function p is a variant of the Mahalanobis distance. The terms I~i(P) and ai(P) (l~i(Q), ai(Q)), for 1 ~< i ~< d, denote the mean value and the variance of the feature "ith" of the cluster P(Q), respectively. The function p satisfies the following requirements V P, Q e P: (1) p(ll(P),a(P),lt(O),a(Q)) >~O, (2) p(I~(P),a(P),I~(Q),a(Q)) = 0 ¢~ #(P) = ~(Q) and a(P) = a(Q), (3) pOL(P),a(P),#(Q),a(Q))~ + oo ~ kt(P) ¢ I~(Q) and la(P)l--* 0 or la(Q)[ ~ 0, (4) p(It(P),a(P),#(Q),a(Q))~o ¢~ Itr(P)l~ + ~ or la(Q)l--* oQ. Here/~ and a indicate the barycenter and the variance vectors of the clusters. Property (3) establishes that if a tends to zero the related cluster becomes hard; property (4) establishes that two clusters are undistinguishable, if they cover uniformly the feature space. Moreover, p is not a similarity function, because it takes values from the interval [0, + ~ [ . In the cases of limited and discrete spaces (for example, for digital signals or images) it can be normalized in order to satisfy the requirements of a similarity function. In the case of a hierarchical fuzzy clustering, the belonging degree of each element x to a cluster Pk is up-to-date, until the leaves (top-down hierarchical clustering), or the internal nodes, representing the required clusters (bottom-up hierarchical clustering) are reached. In the following, three algorithms (Two-Phase Clustering, Histogram Fuzzy Partition, and Hierarchical Fuzzy Squared Error) are described. In this case the input is interpreted as multispectral images, X, of size N = n x n, elements of which are also named "pixels". The clustering of X is also named "segmentation". Moreover, the feature space is normalized to the interval [0, 1], which allows to fix the clustering parameters without taking into account their range variability, and to define easily similarity functions in the range
[0,12. T w o - P h a s e C l u s t e r i n g (TPC). This algorithm needs a suitable definition of similarity between pixels of X and clusters of P. In our case the functions 6)E, O,, and p, defined above are used. TPC-algorithm is developed in two phases (see the algorithm TPC). The first phase consists of a single link algorithm applied on the pixels of X, the similarity O is used. After this phase a set, P', of candidate clusters is obtained. The merging algorithm (phase 2) is based on the computation of the Minimum Spanning Tree (MST) in the cluster space, P', followed by the deletion of the K - I greater arcs in order to obtain K-clusters of P. The centres of the clusters are then computed in the feature space, and stored in a column vector # = [/~o,#1 . . . . . /~k. . . . . #K
1], w i t h /~k --(#k,i,/~k,~,/~k,~,/~*,2 . . . . . /&a)'
The matrix, M, corresponding to P is obtained by computing the belonging degree Z;.k of Xl to Pk. In the algorithm this is performed by using the similarity function 1 a
Ir//,r - ktk,rl
Z i, k ~- 1 -- d r E~'I max {t/i,r,/2k.r}
M is then used to compute the vectors 9~ and N. Moreover, the algorithm TPC is consistent because the elements of 9~ are assigned as follows: 21 = k ,~ (i = Zi, k = max{z~,k,10 ~< k' < K}.
V.D. Gesit ,' Fuzzy Sets and Systems 68 (1994) 293 308
299
TPC( ) f t
repeat (
phase a. Link each pixel, x, to the nearest neighbor, y, iff O ~< [1. After this phase the image is segmented in Np clusters. if (N~ > K) ,f (
phase b. Merge the N~ clusters, by using an intercluster closeness function, p. After this phase the image is segmented in N~ = K clusters. else fl = f l - A; until (Na = K); compute #; compute M; compute ~fl; compute N;
} Here ~ is a positive threshold, which is a non-decreasing function of the mean path in the (d + 2) - D space. For example, in [5] it is shown that, in the case of normal Poisson distribution, N~ clusters are obtained for
N~-I IPI- 1
= - log--
This relation gives a criteria to set ~ in order to obtain N, ~> K at the first iteration. The value is A ,~ c~(for example, A ~ 0.1 c0. Hierarchical Fuzzy Partition (HFP). H F P clustering is an example of top-down procedure. At each stage of the clustering tree the belonging degree of each pixel, x, to a given cluster, P, )~p(x), is computed. The values of Ze(x) are updated during the computation until the leaves are reached (final clustering). The belonging degree is then used to evaluate the goodness of the labeling of the segmented image. H F P is a non-parametric algorithm; in fact, it starts from the intensity histogram in the feature space. In the case of segmentation of digital images, each feature, qi, takes values in the integer space G - {0, 1..... G - 1}. Let {l~(P) li = 1,2 ..... d} be the experimental normalized distributions of the pixels belonging to the cluster P, then the median value, mi(P), of each feature is estimated as follows:
m~(P)=min{m
~ l~.o(P)>~0.5 } for i = l , 2 ..... d. o=O
These values induce a natural partition of the features space into d-D hypercubes, Hr, with r = 0 ..... 2 d - 1. Moreover, let "bab2 ...b/' (b ~ {0, 1}) be the binary representation of r, then Hr = [ h l , k l ] x [h2, k2] x .-. x
[hi,ki] x ... x [ha, kd] ,
where hi = 0 hi =
and
ki = mi(P)
mi(P) and
ki
=
if bi "= O,
G - 1 if
bl = 1.
V.D. Gesit / Fuzzy Sets and Systems 68 (1994) 293-308
300
The split of P in the sons clusters P1,P2 . . . . . P2d is then performed as follows: V x e P ifqx s H r then x e P r , where qx denotes the feature vector of the pixel x. However, this procedure is computationally expensive and not very effective for d > 2. For example, let X be a multispectral image, with N = 1024 x 1024 and d = 4. Each P will be split into 16 suns, and after 5 calls of the split procedure each pixel of X will become a cluster. This problem can be overcome by reducing the dimensions of the feature space. Dimensions of the feature space may be reduced by the selection of the most "relevant" features. This may be achieved by projecting data on the "best" subspace (discriminant analysis) or by generating a new space by doing a linear combination of the original features, this techniques is based on principal component and/or factor analysis [10, 1l, 19]. The methodology to be used in order to reduce the dimension depends on the nature of the data, and it is guided by the a priori knowledge and expertise of the analyst. Discriminant analysis is recommended whenever independence between the features is guessed. Linear combination is more effective in the case of dependence. Moreover, a preliminary exploratory analysis may show the existence of dependency between feature and help the analyst in taking the correct decision [17]. In the following the space is reduced to a single feature q = f ( ~ h , q 2 . . . . . ql . . . . . tla). For example, q~-= ~ai x qx,/, where the coefficients, a~, are normalized elements of the covariance matrix, after the application of a principal components analysis applied to the feature space. The H F P algorithm produces a binary ordered tree; its leaves nodes represent the final clusters, while the internal nodes, P, represent temporary clusters to be further split. The left son of P is denoted by P1 and its pixels have temporary label 2 = 1, and the right son is denoted by Pz and its pixels have temporary label 2 = 2 (see Fig. 2). The final labeling can be obtained by reading in some order the leaves nodes. Let ao, al = re(P), a2 be the leftmost, the central and the rightmost intensity values of the distribution I(P), respectively, then the clustering of an element x is based on the computation of a z-function at each split phase. The )~-function must satisfy the following conditions: if qx = ao
then )~v,(x) = 1,
if qx = al
then Xv,(x) = 0.5,
if q ~ = a l + l
thenz&(x)=0.5,
if ~/x = a2
then Xv~(x) = 1.
The split of P into P1 and Pz is given by the rule Vx ~P
if )O,,(x) >~ 0.5 then x ~P1, if Zv2(x) ~> 0.5 then x ~P2.
i Vx~P X
.
.
.
] .
S
Fig. 2. One step of the algorithm HFPC (/3 = 0.5).
V.D. Gesi~ ,' Fuzzy Sets and Systems 68 (1994) 293 308
301
The meaning of the two conditions is very intuitive; in fact, those pixels, intensity of which falls on the border of the two clusters, must be considered belonging to both of them with belonging degree equal to 0.5, while those pixels intensity of which falls at the left extreme, ao, of the class P1, or at the right extreme, a2, of the class P2, are the ideal elements of the respective classes. The following definitions satisfy the above conditions and have been used in the H F P algorithm: Xe,(x)=
l --
1
v "(x)
2
× ~°-~°'°~ AF1
I tp~ "
and
1 2~-_.(x) UP) Zv~(X) = 1 - - × --2
ZvAx)=
and
AF2
forao~
l - zv,(x)
Zp,(x) = 1 - )~p,(x)
for al < r/x ~< a2,
where al
AF,
= ~
a2
19(P)
and
AF2=
/o(P).
~
o=ao
g~a
t
The value Zv(x) is then updated by applying the operators min and max during the computation of the clustering tree. The rule is based on the fact that Ze(x, r), at the level r, must be reinforced if the temporary label 2.(r) is equal to 2x(r - 1), at the level r - 1, penalized otherwise: Ze(x, O) = 0.5, Zp(x,r) = max{zp(x,r),ze(x,r
-
Ze(x,r) = min{;te(x,r),ze(x,r
-
1)}
if 2x(r) = 2x(r - 1),
1)}
if 2,(r) ~ 2x(r - 1).
This updating rule assures that H F P is consistent. At the end of the computation of the clustering tree, each pixel, x, has assigned a label 2 with a belonging degree X,(x). Moreover, during the construction of the clustering tree each node inherits the labels of the node-father, together with the corresponding belonging degree. With this information, the matrix M can be computed. It must be pointed out that, depending on the structure of the tree, the elements of a leave-node could know, only, the belonging degree, Zo, to a temporary cluster Q = Pi, w Pi2"'" w Pi. In such a situation, we assume Ze, = Ze for i = 1,2 ..... r. After the computation of M the images ~ and N can be computed. Hierarchical Fuzzy Squared Error (HFSE). We develop here a hierarchical version of the fuzzy squared error (FSE) algorithm. The input is a set, X, of N = n × n pattern, each pattern is a column vector with d components: X = [Xo, Xl ..... xi ..... xN l]
with
X i ~ (/~i, 1, /']i, 2 . . . . .
?li,d)"
The barycenters of each clusters in P are represented by a column vector # = [ P o , P l . . . . . Pk ..... PK-1]
with #k = (Pk. I A p k , 2 A .... #k.d),
Moreover, Xis assumed with a metric, 6, and D = ll6(i, k)ll is an array, where 6(i, k) represents the distance between the element xi and the barycenter, Pk, of the cluster Pk. The classical version of the fuzzy squared-error (FSE) algorithms is sketched below. The algorithm is iterative, each pixel of X is assigned to the closest barycenter /~, starting from an a priori partition [~o(0),#60) ..... pk(O) ..... # r - l ( 0 ) ] . Moreover, during each iteration IT, the vector of the barycenter is updated, and the elements of the matrix M(IT) are computed as follows: K-1
Zi.k(IT) = 1 -- 6 ( i , k )
~ /
s=O
6(i,s).
V.D. Gesh / Fuzzy Sets and Systems 68 (1994) 293-308
302
The temporary vector labeling 9t(IT) and the vector NOT) are then obtained as follows: 2,(IT) = k ~
(~(IT) = Z~.k(IT) = max{z,.k.(IT)[0 ~< k ' < K}.
If more than one k' maximize Zi.k(IT), the element x~ will be assigned to the minimum k', this in order to avoid ambiguity; moreover, this rule assures that HFSE is consistent. The iterations stop as soon as a given error function, J, is minimized or the number of iterations reaches a maximum value, ITmax. In the proposed version the function to be minimized is a weighted sum over the elements: x~,,x~...... x~, (R ~< N) that change their labeling during each iteration, IT: N-1
0)= i=0 R
J(N; I T ) = ~ ~oir, IT > 0, r=l
where the weight is computed as follows: coir the new label of x~, at the iteration IT.
=
[Zirk -- Zirk'], is the label of x~, at the iteration IT - 1, and k' is
FSE( )
{
IT = 0; assign each xi to the closest centre Pk, 0 ~< i ~< N -- I & 0 ~< k < K;
repeat compute MOT); compute 9~(IT); compute NOT); compute J(N; IT); IT = IT + 1; until (J ~< Jmi, or IT = ITmax)
} The HFSE has the same structure of the H F P algorithm. It computes recursively FSE by assuming that the number of clusters equal to 2 at each node of the clustering tree; each internal node represents a temporary cluster, and the FSE is applied to its elements. The split of an internal node is controlled by a consistency test, based on an ad hoc heuristic rule (uniformity test, size of a cluster). The procedure HFSE stops as soon as K leaves nodes are obtained. The HFSE algorithm shows a quite different behaviour of the procedure FSE. In fact, both of them need as input the number of clusters and an initial guess of their barycenters. However, in the case of FSE algorithm this evaluation is performed by using the information of the experimental distributions of the spectra features, computed on the whole space X, while the HFSE algorithm performs these evaluations on the pixels belonging to the cluster-father. The use of local distributions of a feature allows a better detection of hidden multimodal distributions. The computation of matrices M, and N, is performed as for the H F P algorithm. IFC. A global strategy has been applied to integrate the algorithms TPC, HFP, and HFSE. The values of the possibility distribution H ~ = (MtTVC)/nTCV,M(HFP)/~HFP,M(HFSE)/IrHFSE)can be computed during a training phase, where the automatic clustering is controlled by an human expert. For this purpose, the confusion matrix CM = [[ci.jll, with K × K elements has been utilized. It allows to compare the decision of two experts; in fact, each entry c;.j indicates the percentage of elements of X t h a t have been labeled "i" by the expert l, and '~j" by the expert 2. A perfect agreement holds if CM is diagonal. The
V.D. Gesi~ / Fuzzy Sets and Systems 68 (1994) 293-308
303
sum, Pe, of the diagonal elements is the percentage of agreement between the two experts. In our case one of the two experts is an automatic method, and r~ = Pe. A second approach uses again CM to compute two statistical indicators: the clusterin9 tendency (CT), and the mer,qin9 tendency (MT): K-1
K-1
CT= ~
T~(j) xpj,
MT=
j=O
~
T,,(i) x p i ,
i=0
where T~(j)=max,ci,jli=O,l(
..... K - l } K-1 ~i=O
T,,(i) = 1
-
j=0,1,
.,K- 1
Ci,j
m a x { c i , j [ j = O, 1..... K - 1} K-j
Y~j = 0
,
i=0,1
..... K--
1.
Ci, j
Here Pi (P j) denotes the percentage of pixels labeled "i" ("j'). In this case the possibility values can be defined in two ways: ~ = CT or 7t = 1 - MT. In the experiments reported in the following F/.a has been computed by using the value Pe. The results of the three algorithms are linearly combined, as it is proposed in Section 2 to compute M. The vectors N and N are then computed by using the maximization rule.
4. E x p e r i m e n t s and i m p l e m e n t a t i o n
The IFC has been tested on biomedical images, that consists of 4-bands Magnetic Resonance Images (8-bits pixel intensity values, and 224 × 256 pixels per image) detected by a 1.5 T imaging system (GE Sigma) at the Department of Radiology of the Stanford University Hospital. They represent MRI of the human skull, with K = 8 relevant classes: background, bone, cerebrospinal fluid, fat, gray matter hemorrhage, tumor and white matter. The 4-bands are produced by adding pulses of radio frequency power to a constant magnetic field. This cause intrinsic spin angular momentum of protons be titled away from the constant field. After termination of the pulses the magnetization tends to realign with the field with two time constants, T1 and T2, which are related to the relaxation spin-lattice and spin-spin, respectively. Further two parameters are generated by controlling the magnetic field strength, the pulse sequence (inversion recovery TI), the pulse delay, and the repetition time (GRAS). Figs. 3(a)-(d) shows an example of MRl-images with an hemorrhage on the top-right corner of the skull. Manual classification allows the radiologist to discriminate quite easily soft tissue for the diagnosis of diseases inherent to that kind of cerebral matter (cancer, hemorrhage). Other classes of diseases (neoplastic, ischemic, demylinating, and infectious lesion) are less evident, and need the integration and the combination of the visual information coming from all the four bands. The training set was based on a set of 20 4-bands MRI-images analysed by a team of three radiologists, who performed the manual classification by using a weighted voting strategy. The resulting possibility distribution, derived from the confusion matrix, was
ff/~, = (M(TPC!/0.55, M(HFP)/0.9, MIHFSE)/0.7). This information was used to segment the same data set with IFC-global. Table 1 reports the parameters TC, MT, and Pe obtained in this experiment. It is interesting to note that the percentage of agreement with the human classification was Pe = 95%, in the case of the IFC-global procedure. This result indicates that the integration of more than one method can produce better results of each method separately.
304
V,D. Gesi~ / Fuzzy Sets and Systems 68 (1994) 293-308
(a)
(b)
(c)
(d)
Fig. 3. MRl-image of a human skull: (a) GRAS component; (b) TI component; (c) T1 component; (c) T2 component.
305
V.D. Gesi~ Fuzzy Sets and Systems 68 (1994) 293 308
¢
(a)
(b)
(c)
(d)
Fig. 4. Clustering of the MRl-images in Fig. 3: (a) TPC algorithm; (b) HFP algorithm; (c) HFSE algorithm; (dt IFC algorithm.
306
V.D. Gesi~ / Fuzzy Sets and Systems 68 (1994) 293-308
(a)
(b)
I¢)
(d)
Fig. 5. The images R corresponding to the labeling in Fig. 4: (a) TPC algorithm; (b) HFP algorithm; (c) HFSC algorithm; (d) IFC algorithm.
307
V.D. Gesh / Fuzzy Sets and Systems 68 (1994) 293- 308
Table 2
Table I Algorithm
CT
MT
Pe (%)
Algorithm
McpU
SQcpu
TPC HFSE HFP IFC AFCM ISODATA
0.7 0.72 0.647 0.8 0.7 0.68
0.276 0.254 0.318 0.231 0.3 0.28
55 90 70 95 88 87
TPC HFSE HFP IFC
11.52 9.8 2.0 25.42
43.7 30.54 22.04 100.28
The same data set has been analysed with two methods: AFCM [4] and a classical ISODATA procedure provided by the ISAW system [14]. The result of this experiment is reported in the last two rows of Table 1. The IFC-algorithm seems to show a better global accuracy. In Figs. 4(a)-(c) the results of the segmentation of the MRI-images in Fig. 3 are shown, as produced by the algorithms TPC, HFP, and HFSE, respectively; Fig. 4(d) shows the segmentation produced by the IFCglobal. Figs. 5(a)-(d) show the belonging degree of each pixel to the final clustering for the algorithms TPC, HFP, HFSE, and IFC, respectively. It is interesting to note that these images take in account of the structural morphology of each clusters. In fact, the borders of two segments (clusters) represent a faster variation of the membership function. Fuzzy clustering algorithms have been implemented on the transputers-based HERMIA machine. The language used in the implementation is a parallel C for transputers (IC-TOOLS), that allows to define the network topology of the transputers array via software. The algorithms are executed sequentially in parallel fashion. In fact, at the present time, the algorithms TPC, HFSE, and H F P are parallelized; however, the hardware restriction does not allow to execute all of them asynchronously. In Table 2 the CPU time in seconds of the four parallel algorithms are given, the CPU-times reported here include the time to send and receive image data to and from the host module of HERMIA; the results are referred to a mesh of 4 x 4 transputers. The last column gives the computation time in the case of a serial machine (IBM-RISC/6000). The results show that the HERMIA-machine allows a speed-up of a factor 4.
5. Final remarks In this paper a new integrated fuzzy clustering technique is described and its performance has been tested on biomedical data. The experimental results show that the accuracy is improved if more than one clustering algorithms are combined to reach the partition of the data set. However, an IFC algorithm needs some a priori knowledge on the relevance of each clustering procedure, and this can be obtained by using training set procedure, and the knowledge of human experts. Algorithms can be combined following several interconnection strategies. In this papers, 91obal and chain combinations are introduced; further investigation will be done in order to include backtracking on combinations strategies, which are represented by more complex graphs.
Acknowledgement This work has been supported by the Italian National Council of Research under contract No. 90.00669.PF69.
308
V.D. Geslt / Fuzzy Sets and Systems 68 (1994) 293-308
References [1] E. Backer and A.K. Jain, A clustering performance measure based on fuzzy set decomposition, IEEE Trans. PAMI PAMI-3 (1) 66-74. [2] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms (Plenum Press, New York, 1987). [3] R.L. Cannon, J.V. Dave, J.C. Bezdek and M.M. Trivedi, Segmentation of a thematic mapper image using the fuzzy c-means clustering algorithm, IEEE Trans. Geosci. and Remote Sensing GE-24 (1986) 400-408. [4] R.L. De La Paz, R. Bernstein, W.A. Hanson and M.G. Walker, Approximate fuzzy c-means (AFCM) cluster analysis of medical magnetic resonance image (MRI) data a system for medical research and education, IEEE Trans. Geosci. and Remote Sensing GE-25 (1986) 815-824. [5] V. Di Gesfi, A clustering approach to texture classification, in: A.K. Anil Jain, Ed., Real Time Object and Environment Measurement and Classification, NATO ASI Series F 42 (Springer, Berlin, 1988) 185-195. [6] V. Di Gesfl, Parallel algorithms for image segmentation, Proc. 16th OAGM-Meeting, Vienna, 6-8 May (1992) 184-198. [7] V. Di Gesu, R. De La Paz, W.A. Hanson and R. Bernstein, A comparison of clustering algorithms for MRI, IBM-PASC, Tech. Report 12-13-89 (1989). [8] V. Di Gesfi, R. De La Paz, W.A. Hanson and R. Bernstein, Clustering algorithms for MRI, in: K.P. Adlassing, G. Grabner, S. Bengtsson and R. Hansen, Eds., Lecture Notes in Medical Informatics (Springer, Berlin, 1991) 534-539. [9] V. Di Gesfi and M.C. Maccarone, Feature selection and possibility theory, J. Pattern Recognition 19 (1) (1986} 63 72. [10] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis (Wiley, New York, 1973). [11] J.H. Friedman and L.C. Rafsky, Graph-theoretic measures in multivariate association and prediction, Ann. Statist. 11 (2) 377-391. [12] G. Gerardi, F. Chiavetta, V. Di Gesfl and D. Tegolo, HERMIA: an heterogeneous and reconfigurable machine for image analysis, MVA IAPR Workshop on Machine Vision Applications, Tokyo (1990) 397-400. [13] E. Gerianotis and Y.A. Chau, Robust data fusion for multisensor detection systems, IEEE Trans. Pattern Inform. Theory 36 (1990) 1265-1279. [14] W.A. Hanson and H.J. Myers, Image Science and Applications Workstation (ISAW 1.10) User's Guide, IBM Scientific Center, Palo Alto (1988). [15] A. Pinz and R. Bartl, Information fusion in image understanding, in: Proc. l lth ICPR, The Hague (1992). [16] E. Ruspini, A new approach to clustering, Inform. and Control 15 (1969) 22 23. [17] J.W. Tukey, Exploratory Data Analysis (Addison-Wesley, Reading, MA, 1977). [18] G.L. Vernazza, S.B. Serpico and S.G. Dellepiane, A knowledge-based system for biomedical image processing and recognition, IEEE Trans. Circuit and Systems CAS-34 (1987) 1399 1413. [19] J.H. Wolfe, Pattern clustering by multivariate mixture analysis: consistency properties, Multivariate Behavioral Res. 5 (1970) 329-350. [20] R.R. Yager, A foundation for a theory of possibility, J. Cybernet. 10 (1980) 177 204. [21] L. Zadeh, Fuzzy sets, Inform. and Control 8 (1965) 338-353.