Similarity of classes and fuzzy clustering

Similarity of classes and fuzzy clustering

Fuzzy Sets and Systems 34 (1990) 213-221 North-Holland 213 SIMILARITY OF CLASSES AND FUZZY CLUSTERING Tao GU* and Bernard DUBUISSON Universit( de Te...

366KB Sizes 0 Downloads 90 Views

Fuzzy Sets and Systems 34 (1990) 213-221 North-Holland

213

SIMILARITY OF CLASSES AND FUZZY CLUSTERING Tao GU* and Bernard DUBUISSON Universit( de Technologie de Compi~gne, U. A. CNRS 817, B.P. 649, 60206 Compi~gne, France Received August 1987 Revised July 1988 Abstract: Three aspects of current interest- similarity of classes, membership functions of patterns in classes and fuzzy clustering- are examined, in relation to multiple measures of classes. Three data sets are studied. The results of pattern fuzziness in classes and pattern rejection from classes are obtained by a fuzzy clustering method. Keywords: Fuzzy clustering; class similarity; membership function; class parameter; pattern recognition.

1. Introduction In pattern classification, often classes are not well separated and there are no well defined boundaries: a pattern may belong to more than one class. We then propose to use the concept of membership function to define the belonging of a pattern to a class. The value of membership lies in the interval {0, 1}. The more a pattern X belongs to class f2, the closer to 1 its ,grade of membership ~ ( X ) is. Such a representation is called fuzzy set representation: this representation is different from the one used in normal set theory. In usual set theory, a pattern can be only assigned to one class and the membership function takes only the value 0 or 1. Fuzzy pattern classification has been studied by numerous authors. Membership functions defined using the K-nearest neighbors rule are discussed in [1, 2]. Inexact matrices constructing a similarity between two patterns of data sets are presented in [3]. Dunn [4] and Bezdek [5], using the similarity measure between patterns and the concept of iterative centers, developed a fuzzy ISODATA clustering method. A weighted average distance between patterns and a fuzzy set is proposed by Ruspini [6, 7]. The present paper defines a similarity measure between one class and other ones. The similarity of classes can then be deduced from this measure. When a new pattern is associated to a class, its membership function with regard to the class can be derived from a modification of the class similarity. The minimum of similarity of classes and the maximum of membership of patterns in classes are used as a criterion for characterisation of fuzzy clustering. * Visiting Researcher. 0165-0114/90/$3.50 © 1990, Elsevier Science Publishers B.V. (North-Holland)

214

T. Gu, B. D u b u i s s o n

2. Similarity of dasses and membership function of patterns The number of patterns and the structure of points in a class are two essential elements in order to exhibit properties about the distribution of patterns in the class. The structure of classes, that is the relation between samples, needs to be studied in order to exhibit this distribution. In order to get some information about pattern distribution and class geometry, one would like to ask "what is a simple indicator representing the pattern distribution in a class?". Fortunately, pattern recognition methodology has collected several useful measures or parameters for that. It is then possible to use some of these parameters to answer the question above. Examples of class parameters are listed in Table 1. In order to make the problem straightforward, we would like now to define the following notations in terms of the class description used. 1. R ~ is the re-dimensional space of patterns. 2. Let g-2= {121, g22. . . . , £ 2 L , . . . , g2N} be the set of classes, with N the number of classes. 3. X = {X1, X2, • • • , X i , • • • , X m IXi • R ~} is defined as the set of training patterns, with m the number of training patterns. Here X i = (Xi,1, X i , 2 , . . . , X i , j . . . . . X i , vt) t is the training pattern i. 4. Similar to X, the set of test patterns is defined by Y = {Y~, Yz. . . . .

Y~ . . . . .

ri = (~,~, Y ~ , 2 , . . . ,

Yp l Y~ ~ R ~ } ,

Y,j . . . .

, Y~.~)'.

Here p is the number of test patterns. 5. The set of class centers is denoted as C = {C,, C2 . . . . .

CL . . . . , CN I C a

• R o,},

where CL(CL,1,

CL,2,...,

CL.j .....

CL,~,)'

is the center of class L. The training patterns in class L are expressed by: X L=(siLI,X

L • • • ~ S i , Lp i,2~

L~t , • • • ~ X t,~]

i=1,2,

• • •

NL,

X L e R % where N L is the number of training patterns in class L. 6. R b is the fl-dimensional space of class parameters. 7. Let # be the set of class measures: # ~---{ # 1 , # 2 . . . . . #L

= (#L,1,

#L,2,

#L .....

#N

[ #L • Ra},

• • " , #L,i .....

# L , f l ) t"

Here #L is called the measure vector of class L. Let us define the fl x N matrix M = (#1, #2 .....

#L,

• • • , #N),

which gathers all the information about class parameters.

Similarity of classes and fuzzy clustering

215

If a pattern Y~ becomes a new m e m b e r of fuzzy set {Xi}, then # and M are replaced by # ' and M ' respectively. 8. Weighted matrix W can he defined as o)

W=

I

"%.

0

where wj > O, Yj = 1 . . . . , ft. 9. We define IL = (o, 0 . . . . .

1,...,

o)'

as an N-dimensional vector which has only a non-zero value (equal to 1) for the L-th component. 10. The matrix Sl,1

\

1

S1,2

• . .

Sl, L

• . .

S1, N

SN,2

"'"

SN.L

"'"

SNN/

expresses the similarity between class parameters #L (L = 1 . . . . . N). If L -- I, set SL = S,,L. 11. Wsn are parameters derived in Mahalanobis distance by using a withingroup covariance or correlation matrix. Essentially, the center of a class has a great importance in analysing geometric arrangement of patterns. From Table 1, we can see that many class parameters are associated with such a notion, We only focus on the parameters in Table 1, which are related to the centers, for considering as similarity measures between classes. Let # L a = CL.j,

j = 1.....

Ol,

IZL,,~+I = DL,

~[LL.tr+l+j = A L d ,

j = 1. . . . .

tr,

#L,2~+I+j= V L , #

j = 1 .....

a,

(1)

#L,3~+2 = 1 / M L . Then fl = 3a~ + 2 is the dimension of the class parameter space. Once R ~ is determined, a linear combination of {#L} and {#[} can be proposed as the similarity matrix S with following elements: S L , , = { [ ( M I L -- M ' I , ) I W [ ( M I L

- M'I,)] t} 1 /2

(2)

216

T. Gu, B. Dubuisson

Table 1. Class parameters Measure

Equation

Center [9]

CL,j = ~ x,~,~Lx,.,

1

1

Intraset distance [9]

2 1/2

~ 1

Absolute average distance [9]

AL,j=~- ~

Variance [9]

vL,j = ~ L y~ (x,,j - c J

IXia--CLal

" ' L X i E [ 2L

1 [ L

Moment [8]

ML

Xi~d L

211':

I

J

N~ DZL + D L

Average Euclidean

1 EL = 2(NL _ 1~

Average Q-NN distance

o.~ o,,,.~L[Z=l(X,,-x~y]

Average Mahalanobis [7]

KL = NL(NL_I-------~

~'~

] Xi,Xk6g~L

[ ~

~1 ''~

[ ~'~ (X~a - X~,j) ! ]~ 1

Xk near to X~ E X i , X k ~ £1L

Wjn(Xid - Xk,j)(Xi,n - Xk,n) n=l

or

SL = {[(MIL -- M'IL)]W[(MIL - M'IL)]t) m.

(3)

Using MIL = ltL, M'IL =/~[ and the expression for W of Notation 8, equations (2) and (3) can be modified into

SL,, = S,,L =

P [j=~l

, 2] 1/2

Wj.(/ZL,j-- #',i) ]

(4)

and SL =

~

~ 2

O)j(~L,j -- ~ L , j

,

(5)

/ ~ l / ~ - 6 Rt~}

(6)

where /~' = {/~'1,/t~,...,/~[

.....

represents the new configuration of classes in which a new pattern E is allocated. In this sense, SL.Z can be regarded as the deviation of class parameters between the new class configuration L in where a new point Y/is associated and the old class 1 in the space R% It should be mentioned that /~[ is not equal to /~L (L = 1, 2 . . . . . N) because of fuzziness. In other words, if a pattern ~ is said to be 'assigned to class L', it only means that there is a larger grade of membership of pattern Y~in class L than in other classes. Formulas (4) and (5) have shown that the elements of similarity matrix S can be interpreted as weighted distances between one class and another class in R e

Similarity of classes and fuzzy clustering

217

space. If St has value 0, it indicates that class L and class L' are very similar; otherwise, if SL is greater, a greater dissimilarity between class L and class L' is exhibited. Let us now discuss the class membership function for a pattern. Let L be the index for which the difference between/&~ a n d / t [ is minimum. Class L is called the first allocation of pattern Y~. Other allocations can be defined for a pattern; it depends on the membership influences of the pattern on other classes. Now, let us suppose X/L -- Y,- and SL(XiL) = SL (Yii becomes a new member of {Xi}); equation (5) can be transformed into

SL(X~) =

wi(l~L,j -- I~L.i



(7)

In a sense, SL(X L) can be regarded as the deviation of class structures between pattern X L within class L and outside of class L. From equation (7), let us define a new function

FL(XiL) = ($L)2 + (SL(xL)) z'

FL(XiL)--~ [0, 11,

(8)

where 1 /%

&

= NL i=lZSL(XL),

L = 1, 2 . . . . .

N; i = 1, 2 . . . . .

NL.

(9)

FL(XiL) in (7) is called the membership function of pattern X L in class L. When SL(XiL) = SL, x L is a crossover point of FL(X L) because FL(X L) = O. 5. It is clear that the closer FL(X L) is to 1, the more X/L is a member of class L. If FL(X L) is equal to 0, then X L can be regarded as a rejected point from class L. Because we use class centers as references to build the measure of class similarity, it is certain that patterns which are closer to class centers will have larger values of membership. In such a case, the conclusion is the same as in non-fuzzy set theory.

3. Algorithm of fuzzy clustering According to equations (7) and (8), two kinds of fuzzy clustering can be proposed. Because (7) is simpler than (8), we use it to develop the fuzzy clustering algorithm in detail. Although the algorithm can be used as a method for fuzzy pattern classification, we still call it fuzzy clustering algorithm.

Fuzzy dustering algorithm. 1. Read {X #} and {Ci} from clustering results on the training pattern set. In our laboratory, the training data set is clustered by the unsupervised method OUPIC [10]. 2. Calculate the class measure vector {#L} according to equation (1).

218

T. Gu, B. Dubuisson

8 d

+

®

8

+ +

+++

+ +

o

+

+

o

8

O

O

o •

• • ~ , ~ O0 o

/;o o

000 x

X o Oo

x x ~x

X

X l

o

X

XX X~XX~:X~,~x

+

+

0

o

*

o

o •



o

o

XX

°o

A

X

a

A

aa aa~A

a

a

aa

a

A I

I

16.00 1 2 3

Symbol : O

A

a

I

8.00 Class:

®

a

X X I

0.00

+

O

+

o

Xx

Xx x

O

+~ "t+ ++

A

I

I

I

I

I

24.00 32.00 4 Reject Fuzzy X

+

I

I

40.00

I

48.00

I

56.00

@

Nr. of points: 70 40 51 54

3

4

Fig. 1. Clusters of patterns (train: 194, analyzed: 28) by similarity.

8

c5 + +

8

+

+++

+

+

+ + +

o

+++

+

o o

~

o4

0

(9 o

o

X?XXx-X o x x xX X X X XX ~ X

+

0 0 0

A

& o

++

+

+,

(~gt~)O 0

°o

A

AA ~&

A a

A

X X

X

o

oZ1~o

X 0 0~' 0 00~000 ~X X X x g uO Xx I x O 0 0 0 0 0

8

8

+

o o

o

A

A t

a

A

AA

a

A i

c5 0.00

i

I

i

8.00

i

I

16.00

Class:

24.00

1

2

Symbol : O

A

+

X

42

46

46

Nr. of points:

60

3

i

I

32.00

1

i

i

40.00

4

Fig. 2. Clusters of patterns by oupic method.

I

48.0O

I

i

56.00

Similarity of classes and fuzzy clustering

219

3. Set the weights {toj} of equation (5). toj is proportional to the number of patterns involved in the computed class L and to the dimension of the variable x l l~ll/2~)#t.,j (for example, to3~+2 = teeN ~ /~+lJ/,,t 4. Read the analyzed patterns {Y,.} and compute {#[}. 5. Compute {St} according to equation (7) or (5). 6. Sort {SL} in an ascending order to find the first and the second allocations of pattern Y,-;set class indexes for retrieving allocations of pattern Y~. 7. Detect the farthest points from every center of classes. 8. Replace Y~by these farthest points and compute {S[}. 9. Input the value of rejection factor R. If SL/S'L >1 R, reject Y~from class L, and set class index of Y~to zero value for retrieving process. 10. Input fuzzy threshold Z for selecting allocations of pattern Y~. If S'L~I)/S~ ) <~ ~, pattern Y, is assigned to two classes. Here we use signs (1) and (2) to express the first and the second allocations for a pattern. 11. If there is another pattern to be processed, go to Step 4; otherwise, go to the next step. 12. Retrieve the class indexes to output the result of pattern classification.

4. Experimental result In order to illustrate the behaviour of the algorithm three data sets have been tested. Figure 1 shows an artificial two-dimensional data set consisting of 194 training patterns and 28 test patterns. The results give four fuzzy vectors and three rejected points. The configuration of 194 training patterns clustered by the OUPIC method is given in Figure 2. If we compare the two drawing results, we can clearly see that the 28 test patterns are well classified with the minimum of S • S t. The effect of rejection factor and fuzzy threshold on the results is given in Table 2. We also tried to test three- and four-dimensional data sets. Results of these experiments are shown in Table 3 and Table 4. These results are also appropriate. Since the three-dimension data set in Table 4 is clustered into three separated classes, no fuzzy pattern is detected; one rejected point is found. Although the Table 2. Influence of rejection factor and fuzzy threshold on the result of 28 clustered patterns Reject factor

Fuzzy threshold

Reject points No.

Fuzzy points No.

2.0 2.0 2.0 1.5 1.5 1.5 1.2

4.0 2.5 2.0 2.5 2.0 1.5 2.0

0 0 0 3 3 3 4

12 10 7 6 4 4 2

T. Gu, B. Dubuisson

220

Table 3. Result of 56 four-dimensional patterns (30 training patterns and 26 test patterns) Class

Training points

Test points

Fuzzypoints

Totalpoints

1 2 3 4

10 9 7 4

15 3 7 5

2 1 1 2

25 12 14 9

Fuzzy points Y(2) =0.5412 Y(16) =6.0234 Y(3) =0.4776 Y(18) = 6.6754 Y(14) = 8.2091 Y(26) = 7.7723

0.3506 4.0233 0.3483 4.8734 4.5870 4.5236

Reject points

Y(10) = ¥(13) = C(1) = C(2) = C(3)= C(4)=

Centers

0.0311 4.8231 0.0298 3.2345 2.9850 3.2940

Class allocations (1) (2) 0.0927 0.5349 0.1163 0.4648 0.7040 0.8745

11.6304 0.0079 1.5826 0.4433 7.7707 7.5596

4.3430 6.9803 1.3927 0.5005 4.5423 4.4186

2 4 1 4 3 3

1 1 2 3 4 4

0.4030 0.0045 1.4603 0.0279 1.8536 3.8961

6.9870 1.1090 0.6324 0.0967 1.1401 0.6356

Table 4. Result of 183 three-dimensional patterns (131 training patterns and 52 test patterns) Class

Training points

Test points

Fuzzy points

Total points

1 2 3

50 34 47

19 17 15

0 0 0

69 51 62

Reject points Y(43) = Centers C(1)= C(2)= C(3)=

30.2940 0.3066 3.4207 22.4672

0.1987 7.9118 44.3252 0.2381

0.3029 5.7755 3.2414 0.2311

r e j e c t e d p o i n t ¥ ( 4 3 ) in T a b l e 4 is n e a r t h e t h i r d class c e n t e r , it is still b e y o n d t h e a d m i s s i b l e r e g i o n o f t h e t h i r d class ( t h e t h i r d class v a r i a n c e v e c t o r is 2.5201, 0.0443, 0.0426).

5. Conclusion In this p a p e r , w e p r o p o s e to use t h e c o n c e p t s o f m e m b e r s h i p f u n c t i o n a n d class similarity for fuzzy clustering. T h r e e d a t a sets h a v e b e e n t e s t e d : t h e r e s u l t s s e e m satisfactory.

Similarity of classes and fuzzy clustering

221

This fuzzy clustering algorithm presents several possibilities for a classified pattern: multiple allocations, allocation and pattern rejection. Specially, we would like to note that the rejection factor and the fuzzy threshold used in this approach are dependent on the analyzed data sets, and are necessary to be appropriately adjusted by the minimum of S • S t if the clustered classes are very close to each other. To sum up, it is our belief that the algorithm presented in this paper is an elaborated approach for fuzzy clustering.

Acknowledgement The first author, Tao Gu, wishes to take the opportunity to express his mourning for the death of the great scientist Dr. K.S. Fu, who concientiously discussed the concept of similarity of classes with him in 1984.

Bibliography [1] T. Gu and B. Dubuisson, A loose-pattern process approach to clustering fuzzy data sets, IEEE Trans. Pattern Anal. Machine lnteU. 7 (1985) 366-372. [2] M. Bereau and B. Dubuisson, An adaptative algorithm of fuzzy classification in a partially supervised environment, Proceedings of 8th International Conference on Pattern Recognition (Paris, Oct. 1986) 120-122. [3] S. Tamura, S. Higuchi and K. Tanaka, Pattern classification based on fuzzy relation, IEEE Trans. Systems Man. Cybernet. 1 (1971) 61-66. [4] J. Dunn, A fuzzy relative of the ISODATA process and its use in detecting compact well-separated dusters, J. Cybernet 3 (1973) 32-57. [5] J.C. Bezdek, A convergence theorem for the fuzzy ISODATA clustering algorithms, IEEE Trans. Pattern. Anal. Machine lnteU. 2 (1980) 1-8. [6] E.H. Ruspini, Recent developments in fuzzy cluster analysis and its applications, Proceedings of International Conference on ASRC (Acapulco, Mexico, Dec., 1980). [7] A. Kandel, Fuzzy Technique in Pattern Recognition (John Wiley & Sons, New York, 1982). [8] T. Gu, A new criterion for optimal classification of patterns, Proceedings 6th International Conference on Pattern Recognition (Munich, Oct. 1982). [9] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis (Wiley-lnterscience, New York, 1973). [10] T. Gu, An optimal upturn performance index classifier - OUPIC, Scientia Sinica (Series A) 15 (11) (1982) 1219-1225.