A fuzzy extended k-nearest neighbors rule

A fuzzy extended k-nearest neighbors rule

Fuzzy Sets and Systems 44 (1991) 17-32 North-Holland 17 A fuzzy extended k-nearest neighbors rule M. B6reau and B. Dubuisson UniversiM de Technologi...

663KB Sizes 40 Downloads 193 Views

Fuzzy Sets and Systems 44 (1991) 17-32 North-Holland

17

A fuzzy extended k-nearest neighbors rule M. B6reau and B. Dubuisson UniversiM de Technologie de Cornpi~gne, U.R.A. CNRS 817, B.P. 233, 60206 Compi~gne Cedex, France Received December 1988 Revised July 1989

Abstract: The purpose of this paper is to describe an automatic fuzzy classification algorithm working in a partially supervised environment. This algorithm is based on the fuzzy labeling of samples by means of a membership function. As long as the membership function carries more information than the classical characteristic function, the concepts of clustering and discrimination are extended to the fuzzy field. First the learning set is fuzzily clustered. The membership function is based on a weighted k-nearest neighbors rule and does not require the optimization of any criterion. Second the test set is analyzed and a new membership function is proposed. As the learning set is generally incomplete, the creation of new fuzzy classes is studied. The measure of fuzziness involves some analogies with Statistical Thermodynamics, and the link between the membership function and the Fermi-Dirac statistical function is discussed. Finally results on simulated data are reported.

Keywords': Classification; weighted k-nearest neighbors rules; membership function; fuzzy clustering; fuzzy discrimination.

1. Introduction The k-nearest neighbors rules have been widely used since their introduction in order to classify data whose distribution law is unknown. It has been shown [1] that in the case of the 1-nearest neighbor rule the misclassification rate is less than twice the Bayesian asymptotic risk, which remains the lowest bound for the error risk of every classifier. These properties, added to the simplicity of the rule, explain the popularity of the method and the abundance of studies devoted to the subject. See [2] for example. The primary rules have been modified in two main directions: • rejection, when the neighborhood is ill-defined [3, 4] and weighting, in order to take into account the distance between the point and its neighbors [5, 6, 7]. • fast research of the nearest neighbors without calculating all the distances, either by reducing the number of data [8, 9, 10], or by pointing out a structure in the data set [11, 12, 13]. Like the k-nearest neighbors rules, the fuzzy methods are suited to data for which no hypothesis about the statistical distribution is available. Yet the fuzzy set theory introduced by Zadeh [14] pursues a wide goal: to deal with vagueness and uncertainty. Consequently, the characteristic function of the set A is replaced by the membership function /ZA; l~z(X) accounts for how much the element x belongs to 0165-0114/91/$03.50 ~ 1991--Elsevier Science Publishers B.V. All rights reserved

18

M. B&eau, B. Dubuisson

the set A. It takes its value into the [0, 1] interval. Between the nonmembership (ItA(X)=O) and the full membership (PA(x)= 1), the membership is gradual instead of abrupt. The union and the intersection of two fuzzy sets A and B are defined, respectively, by kta,,a(x) = max(~ta(x), #n(x)), ~tm,,n(X) = min(~ta(x), ~n(x)).

From these definitions, it follows that, given a nonfuzzy set E and a fuzzy partition (Ai)i=~ . . . . . . on E, generally, ~.Ji Ai :¢: E and ['-~iAi=/=0. The fuzzy sets theory has been successfully applied to Pattern Recognition. See [15, 16, 17] for more detailed developments. A fuzzy partition of the data set accounts for overlapping classes that are frequently encountered in practical applications. The first algorithms of fuzzy clustering are Ruspini's [18] and the fuzzy c-means [16, 19]. Both look for a fuzzy partition of the data set by the optimization of a criterion. A few studies are devoted to extension of the k-nearest neighbors rules with fuzzy objectives, such as the nearest prototype rule in [16], and [20, 21]. A learning scheme for a fuzzy k-nearest neighbors rule is proposed in [22]. We have combined the k-nearest neighbors rules and the fuzzy techniques in order to build a fuzzy clustering algorithm and a fuzzy discrimination algorithm. The comparison of the membership degrees of the incoming points is an appropriate tool to follow the evolution of the system between two present classes or towards a new one. Consequently the rejection and creation of new classes are studied in the fuzzy case. The algorithms of fuzzy clustering and discrimination are described in Sections 2 and 3, respectively. In Section 4 some results obtained on simulated data are discussed.

2. Fuzzy clustering of the learning set A learning set E L is given. The fuzzy learning on the set E L consists of the analysis of E L by a fuzzy clustering a!gorithm, which detects fuzzy classes in EL. Each point of E L is labeled and receives a membership vector /l whose components /zi (i = 1. . . . . c) are the membership degrees of the point to the c various classes of EL. No prior knowledge is available. The algorithm initiates the first fuzzy class with the patterns which are tight to it. The points are analyzed sequentially. 2.1. Definition of the membership function

The membership function has been built in order to satisfy several constraints. y is the current pattern. We call K = {x/, / = 1. . . . . k} the set of its k-nearest neighbors in the previously labeled part of the learning set. c fuzzy classes have been detected up to now.

Fuzzy extended k-nearest neighbors rule

19

The membership function must take its value into the [0, 1] interval. The value 1 is attributed to the ideal pattern of the i-th class referred to as its prototype. The distance dj = d(y, x i) between y and its neighbor ~ weights the membership function, so that the more d/increases, the less ad interacts on y, and the less the labels of y depend on those of x/. Consequently, only the acceptable neighbors of y (according to [23]) have to play a role in the labeling of y. More than the absolute distance dj, it is the relative distance between y and x"/, compared to the intraclass distance of the i-th class tight patterns, which is significant. The points density can greatly vary from one class to another. The variable which is considered in the calculation o f / ~ ( y ) is d i d m', where dj = d(y, xi), and dr"* is the average distance between points having a high membership degree/~,-. The membership function must integrate /~(x j) for j = 1 . . . . . k and the variable d i d " : /*~(y) = F[/t/(xJ), f(g/d,,,,)],

1 <~i <~c.

f must be a decreasing function of the variable did'% lim,__,+~f(u) = 0. The following membership function has been chosen:

[

I~i(y) = max tti(x) exp - Z xJeK

,

with f ( 0 ) = I, and

1 ~
The right part of this expression uses the decreasing exponential function

f(u)=exp[-;t(u2)l; ~. is a real factor which controls the decreasing of the function. It belongs to the ]0, 1] interval and can be taken close to ½. It provides a great flexibility to the membership function with regard to the actual data. The Euclidean distance has been chosen, but every Minkowski distance can be used with the advantages and disadvantages of each. The role played by ~i(ad), 1 ~
M. Bgreau, B. Dubuisson

20 f(u) i .80

0.75 0.50 0.£5

0.00 •

I.

2.

!.

4.

S.

u

Fig. 1. 2.2. Algorithm of fuzzy clustering The membership function is used in a fuzzy clustering algorithm which does not demand the optimization of a criterion. Furthermore this algorithm works sequentially and integrates the reject option and the creation of new fuzzy classes. Each time a pattern is analyzed, a fuzzy partition can be edited in order to follow any evolution. The algorithm consists of the following steps. Step 1: Read the various options and parameters. Read the elements of the learning set EL. Step 2: Initialization of the first fuzzy class. • Determination of the mean m for all the patterns. • If the option: "Search for the fuzzy invariant patterns" has not been chosen: ordering of the points by their increasing distances with regard to m, which receives the membership degree/,~(m) = l. • Determination of the kernel of the first fuzzy class: points (whose number has been previously fixed) for which the membership degree is greater than a threshold:

[ ("N

#l(x) = 1 X exp -

dm'

/i- ~ min.

Step 3: Analysis of a current point y. Calculation of ~i(Y) for i = 1 , . . . , c, where c is the present number of fuzzy classes. If for one i,

[ ("S]

/~i(y) = max/*/(~) exp - A JcI 6 K

~>/4 mi,,

then • Insert y into the labeled learning set. • Update dm'. • Go to Step 5. Else reject the point and analyze the following point. Step 4: Analysis of the rejected points. If the number of rejected point reaches the maximal number (previously fixed): if the maximal number of fuzzy classes has not been reached, initiate a new class as in Step 2. For the previously labeled

Fuzzy extendedk-nearestneighborsrule

21

patterns, calculate the membership degree with respect to this new fuzzy class. Go back to Step 3. Else integrate these points into the labeled learning set without creating any new fuzzy class, if it was not possible to merge two fuzzy classes (see Step 5). Analyze the following points by calculating only their membership degrees to the present classes. Step 5: Merging of two fuzzy classes. If a certain number of points has been analyzed, examine the fuzzy classes two by two, and merge the fuzzy classes i and j if, for every labeled pattern y, the difference between #/(y) and #j(y) does not exceed a threshold. Go back to Step 3. Step 6: Determination of the fuzzy invariant patterns. If the option "Search for the fuzzy invariant patterns" has been chosen, perform Steps 2-5, while modifying the order of the incoming points and without ordering them with regard to the mean point. Memorize the various fuzzy partitions. Search for the fuzzy invariant patterns. Perform on them an ascendant hierarchical classification. Step 7: Write either the elements of the learning set E L and their labels with respect to the various fuzzy classes or the fuzzy invariant patterns and their average labels to the classes. Calculate and write a measure of fuzziness for every class. Remarks (a) Initialization. This algorithm needs to be initiated without prior knowledge. The mean of the samples is considered as the prototype of the first fuzzy class and it is used to start the initial process. Then it is abandoned to the benefit of the kernel of the first class. (b) Analysis of a current point. The search for the k-nearest neighbors of y is only made inside the labeled part of the learning set, as the analysis is sequential. The membership degrees of y are compared to thresholds for the different classes. Either y is integrated into the labeled set or it is rejected. Whenever it is integrated into the labeled set, y is not assigned to one set. On the contrary its membership degrees with regard to all the present classes are encountered. The fuzzy clustering can be considered as a fuzzy assignation process. The flexibility allowed by the A parameter in calculating the membership function prevents the clustering algorithm from an unrealistic increase of the number c of fuzzy classes. Integration and rejection can be interpreted in the [0, 1]c hypercube of the membership degrees where c is the current number of fuzzy classes. (c) Analysis of the rejected points. The rejection is a consequence of the disrespect of the orthogonality condition. It supposes that all the membership degrees of a point are low. The rejected points are analyzed and a new fuzzy class is created in the same way as for the first class. This creation can be interpreted as the increase in the dimensionality of the [0, 1]c hypercube of one unity. (d) Merging of two fuzzy classes. On the contrary this operation, when it is successfully performed, decreases the dimensionality of the [0, 1]~ hypercube of one unity. The fuzzy classes i and j make a single one if I#,(Y) - #j(Y)l < eM, for every labeled y. Then #i(Y) and #j(y) are replaced by their mean. Integration, rejection, creation of a new fuzzy class, and merging of two fuzzy classes are illustrated in Figure 2.

M. B~reau, B. Dubuisson

22

Kernel of 7S2

Merging area

/

u2

/

/

/ ,'//

i.

/

~2 min i

// I 1

_

), 1.

~

\

Re)ection

area

~11

f r o m U 1 and "-'2

Kernel of 'J3

P3

Fig. 2.

(e) Classification of the fuzzy invariant patterns. First the concept of an invariant pattern must be extended to the fuzzy area. In the classical case, an invariant pattern is a stable pattern for classification, i.e. which always belongs to the same class, whatever the number of executions of the algorithm [24]. From the set of invariant patterns a reliable classifier can be built. In the fuzzy case, a pattern will be called a fuzzy invariant pattern if for different fuzzy partitions of EL, it owns similar memberships to the present classes. To that extent, the stability of a fuzzy partition is proposed. Practically, y will be a fuzzy invariant pattern if M m=l m~rn'

~ [#re(y) __ #~'(y)]2 < e,p, i=1

that is, M

E

II/zm(y) --/*T'(y)ll 2 < e,p

m=l m#,m'

in the [0, 1]c hypercube, where M is the number of executions,/*7'(Y) is the class /-membership degree for the m-th execution, and c is the number of fuzzy classes. Then the membership v e c t o r / , is calculated as the average on the M executions. In order to locate each invariant fuzzy pattern with respect to the others, an ascendant hierarchical classification is performed in the [0, 1]c space.

Fuzzy extended k-nearest neighbors rule

23

~(x) 1.00

0.75

0.50

0.25

0.00

0.00

,

0.25

0.50

0.75

>

1.00

x

Fig. 3.

2.3. Measure o f f u z z i n e s s It is of interest to measure the quantity of fuzziness carried by each class. The higher the result of this measure, the greater the fuzziness of the partition. The 0 value is suited to the nonfuzzy case. Several studies have been devoted to the measure of fuzziness [25, 26, 27]. Among the different measures, the logarithmic entropy, formally similar to Shannon's entropy in Information Theory, has been chosen. It is defined by the function H ( x ) = x log2 x + (1 - x) log2(1 - x).

That is, for this problem, and for each class i, 1 ~ i <~c, Hi = -

~'~ ~ ( x ) log2(~i(x)) + (1 - ~ti(x)) log2(1 - kq(x)). XEEL

The entropy function H ( x ) is presented in Figure 3. The entropy function presents some analogies with that of Thermodynamics (see [25] for more details), which accounts for the degree of order carried by the system under study.

3. Fuzzy discrimination Let us consider the Heaviside function defined by u(t) = {10 if t > 0 , otherwise. It allows us to present a new formulation of the classical k-nearest neighbors rule in terms of the characteristic function. Let y be the current pattern being classified. K = {x/, j = 1. . . . . k} is the set of its k-nearest neighbors. Xi is the

M. B&eau, B. Dubuisson

24

k--5

u(t i) 1

! I o

i

~

3

~

5

t.1.

Fig. 4.

characteristic function of the class Ci, defined by X~(y) = {10 ify~C~, if y ~ Ci. Then

That means that if E,~j~rXi(x~) < lk, then y q Ci and if Ex,~KXi(x/) > Ik, then y e Ci. Ties are broken arbitrarily, as usual. This rule is exemplified in Figures 4 and 5 for k = 5 and k = 6. In the fuzzy case, the characteristic function is replaced by the membership function, and the transition between nonmembership and full membership must be gradual instead of abrupt. Among the functions having the desired properties, the following has been chosen: 1

l<~i<-c"

1 + exp[ ~ j f(½k)=1 and Ik is an inflexion point, lim,,~o+f(ti)=O when ½k/bi is large enough, lim,_,+~f(t~) = 1. Figure 6 illustrates those properties. In order to take into account the membership degrees of the neighbors x/, j = 1 , . . . , k of y, just as the distance between y and x/, the variable ti is taken k--6

0

1

2

3 Fig. 5.

,,[,

4

5

6

tl

Fuzzy extended k-nearest neighbors rule

25

f(x) I

/

0.5

,

k/2

)

k

Fig. 6.

equal to

[

ti = ~ ~ui(xI) exp - Z xEK

and the parameter bi is chosen as the logarithmic entropy Hi previously defined. Thermodynamics considerations have guided this choice. The desired function presents some analogies with the Fermi-Dirac distribution function. The latter expresses the distribution law for Fermi particles, such as electrons inside a solid. f(E) =

1 E - EF'X ' 1 + exp~ ~ )

where E F is called the Fermi energy, T is the absolute temperature, k is the Boltzmann constant, k -- 1.38 × 1 0 -23 J K - 1 and k T is homogeneous to an energy. This is presented in Figure 7. The variation is brutal in the neighborhood of EF. The more the temperature decreases, the more the variation is abrupt. The desired membership function is the complement to 1 of the Fermi-Dirac distribution function. Indeed the Fermi-Dirac distribution shows that the lowest energy levels are the most populated. On the contrary the membership function approaches 1 for a value of ti close to 1, i.e. when all the neighbors are tight to class i. f(E) I

0.5

EF Fig. 7.

26

M. Bdreau, B. Dubuisson

Apart from this difference, which lies in the inversion of the proposed problems, the analogies between both functions are easily pointed out: • ½k stands for the Fermi energy EF. In both cases the variation is brutal around this point. • Hi is analogous to kT. When T = 0, all the electrons are constrained to immobility. As T increases, the shocks between particles become more numerous, so that peripherical electrons can get over the gap and conduct the electric current. That is the reason why the entropy has been chosen for the b~ parameter. H = 0 is suited to the nonfuzzy case. The more H increases, the more the quantity of fuzziness increases. So that H accounts for a quantity of fuzziness, while thermodynamic entropy accounts for a quantity of order. The order-disorder concept, upon which thermodynamics is based, is similar to that of fuzziness. Consequently the first step of the fuzzy discrimination algorithm consists of a new labeling of the elements of EL, according to the new membership function. The points are studied sequentially. • Let y be an element of EL. Its neighbors are searched in EL. The distances dj = d(y, x i) are calculated, just as the average distances d"' for each i. • The membership degrees are calculated. • The values of the entropy are updated for each class using the new membership degrees. This second membership function is more sophisticated than the first one, but it requires that a finite set be previously labeled.

3.1. Algorithm of fuzzy discrimination This algorithm is based on the second membership function. It analyzes a test set ET. The reference set in which the k-nearest neighbors are searched is the union of EL and the current part of Ev. The algorithm consists of the following steps. Step 1: Read the labeled elements of the learning set EL. Read all the parameters. Read the elements of the test set E-r. Step 2: New labeling of the learning set. Calculate the logarithmic entropy from the results of the learning algorithm. Search for the k-nearest neighbors of each element of EL and calculate its membership to the fuzzy classes according to the second membership function. Calculate the resulting entropy for each class. Step 3: Analysis of a current point y. Search for its k-nearest neighbors in EL or the labeled part of ET. Calculate its membership degrees to the present fuzzy classes with the second membership function. If for one i,/ai(y)/>/ti min, then • Integrate y into the labeled part of ET. • Update dm'. • Go to Step 5. Else reject y and analyze the following point. Step 4: Analysis of the rejected points. If the number of rejected points reaches the maximum number (previously fixed): If the maximum number of fuzzy classes

Fuzzy extendedk-nearestneighborsrule

27

has not been reached, initiate a new fuzzy class as follows: • Initiate the class according to the first membership function as in the clustering algorithm. • For the previously labeled patterns, calculate the membership degrees with respect to this new class. • Label all the points again with respect to this class as in Step 2. Go back to Step 3. Else integrate these points into the labeled reference set without creating a new fuzzy class if it was not possible to merge two fuzzy classes. Analyze the following points only by calculating their membership degrees to the present classes. Step 5: Merging of two fuzzy classes. If a certain number of points has been analyzed, merge the fuzzy classes i and j if, for every labeled pattern y, the difference between #i(y) and #j(y) remains lower than a threshold. Update Hi for each class. Go back to Step 3. Step 6: Write the labeled elements of EL and ET. Write the logarithmic entropy for each class. The structure of this algorithm is similar to that of the clustering algorithm. Integration, rejection, merging and creation of new fuzzy classes can be interpreted in a [0, 1]c' hypercube, whose dimensionality c' is greater than (or equal to) the former dimensionality c. As the second membership function needs a finite reference set that is known and labeled, the first membership function is used to start the initial process. Then it is abandoned. The labeling of the learning set is complete during the discrimination process, with respect to the newly created fuzzy classes. It is a consequence of the adaptability of the method. Contrary to the hard classification process, there is neither good nor bad assignment in the fuzzy classification. Consequently the concept of classification error simply does not exist. There is no need and no sense to compute any error rate.

4. Results on simulated data

Both algorithms use a rather large number of parameters, which have to be fixed before any execution. The most important of these are: • the number k of nearest neighbors, • the weighting factor ~., • the maximal number c of fuzzy classes, • the thresholds #i rain for i = 1 . . . . . c, • the thresholds for the calculation of d ml for i = 1 , . . . , c, • the number of rejected points before analysis, • the number of analyzed points before merging, • the threshold of merging eM, • the number M of executions and the threshold for a fuzzy invariant pattern, when the option has been chosen.

28

M. BHeau, B. Dubuisson

Table 1

Mean Variance Number of points

Class 1

Class2

Class3

(0,0) (9, 9) 30

(13,5) (12, 12) 60

(25, -7) (9, 9) 30

It is necessary to choose the value of these parameters carefully, especially for the clustering algorithm, as the results are sensitive to their variations. Rough and empirical criteria can be proposed in a few cases. For instance the parameter A can be chosen so that the average membership is close to ½ for every class. The thresholds for the calculation of d m' can be taken so that a high percentage of points have memberships greater than those thresholds. In order to test the clustering algorithm, three Gaussian classes of dimensionality 2 have been generated with the respective parameters (see Table 1). The clustering algorithm detects three fuzzy classes with the following parameters: k = 3; ~ = 0.45; ~1 min = 0.40; /[/2rain~ 0.50; ~3min = 0.50; 15 rejected points before analysis. The first fuzzy class corresponds to the second and most populated generated class, the second fuzzy class to the third hard one, and the third fuzzy class to the first hard one. No merging has been performed, as expected.

~, .09 e. 40

4r"45~.<; 41 . •

+.42 +.41 ~.46

¢.44 .l.-~ 0

+'~

*'55÷.51+.44 ~- .56

÷'%~.~,1

~.7~87 +.?5

+.14 ~I0

+

¢l}q

~,~o

~4S 4151~.<8 4.+.co

÷.13 +.87

.,09

41.00 ..~o

.~.77

,.O9 ~0~..07 ÷.01

*.33

• I.OO

¢.c;' +.t~

..o1..01

¢•59

*.01 ~.02 ÷.o4

,.4]

• .

*,23

÷ .57

.~.03

,.ol

¢.06 *.o6 +.o5

-4".36 _~.12

4""°~'~'° 7

@ .04

-~ .07

~r.06 +.04

+.13 * .09

~0%.06 • ÷ .07

.06 .O5 4- ÷

÷.O5

+.O6 *.05

-2[

2)

6".

~o~

zg.

~'.

X

Fig. 8. Class 1 membership.

22:

26'.

30.

Fuzzy extended k-nearest neighbors rule

29

ii. *" +-33

33~ "33 +~32

t-.33

~e-32

~.33

+'28"34

+.59

.32

+.33 .34

+

35 , . 3 5

.

+.35 + ÷. ~5 34

+.70

t.35

5.

+.23

Y ÷

~37 +.27

. 22

+.21 . .21

÷. 34 * "

.. +'Joe.3~

+.71 . . 3 5 4 -'3~ .~.3P~ +.37

4..8 7 +. 3~,

+.20 +.19

+ .20

+.3q

+.2-

+.i,!

7

+.1~,18 4"I<) * * 1 6 ÷ . i ,

~.33

*.'3

,. 7O

~.~7 *.75 ..76 ÷.74

e.!4 +.88 4,86

+.15

~-. 784.77

~.73

*.78

4~. 76 7~.77

÷.7S

~-.93

.76 .76 ~ .4~76

÷.7S

*.77 ~-.75 -13.

I -6

.

-2.

I

6 .

2 .

i0

1

*

14

I •

18

22.



7,6.

30.

X

Fig. 9. Class 2membership.

ll.

+.51

4~ 36~" 34 " , , .34

"~ .41

(..54

4''44 $,42

+.14

..40 ÷.41 ~-37 4" -34

.54 +.57 ~i+. 73 .75

5.

.

4.81 4 .84 +.82

.%.70

~2~521-~50#.49

.75

+.70

e'..

~'~4534" 50 #'47 -~7

,*..24

J~.36

40 *" 3s ~ 3 8 4-~-,35

*.50* .4,8

+ ,q+.~8

{-.49 ,.46

4"8°

+ .33

~.41 % 4 2 40



Y 2.

~,.74

÷.4el .4 °

÷.44

+,2~

I¢ 40 ..~a

e 39

+.98

~.24

-l. +.66 ~.67~. 64 +.59

.

.30

+

..io

.31 ,.13

~.65

~.13 +.21 *.19

~.12

*.15- .14 +.14

~iI 4-%13 + .ll

,.2,~'-} 13 .14

4-17

.134.1313 ~ ÷.

+.13 ÷.12

,->

L

-2.

2.

6.

i0.

14.

18.

X

Fig. 10. Class 3 membership.

22,

26.

30.

30

M. B~reau, B. Dubuisson e.56 +.62 ,.58 .73

e'7~ .75

0 7 ~.70 . 7,.67 ~. 2 ~.66.61 4-.73 ~.~ .65 . 6 3 ~.76 *.66 +.85+. ,~4"75 +.76~ 74 "Wr+' 66 .67 +. 7 4 4- . 4 4

~.75

i2.

_

~.73

÷.76 Y

ff.38 6.

+.18

+.73 ~'73

9.1S

4-1,

.66

,.58

+.73

~70 .iS

+.19 ~.18 t 19 18

.18

~-18 +.17

+.19

4.19 la÷-19 •

~.17

~9~.18

.

.17 ÷.18

.18

+.17

+.!6

+.17

9

,.17

&.19

4.16 ÷.i~ ,.~7

-6.

+.16 .16 +4-17 +.16

+.15 ÷.ll "~ 4,-!5

4,16

-12.

.9.

-L

3!

<

~

21'.

271.

33.

39.

Fig. 11. Class 4 membership.

The entropy of the three fuzzy classes has the respective values: //1 = 0.58; /-/2 = 0.81;/-/3 = 0.79. The first class is looser than the other two. The results are illustrated in Figures 8, 9 and 10, where the membership degrees are written in place of the points for the first, second and third class, respectively. The discrimination algorithm has been tested on a simulated test set consisting of representatives of three Gaussian classes very close to the three former classes, and representatives of a fourth Gaussian class whose parameters are: mean (27, 15); variance (9, 9); number of points 40. The values of the parameters common to both algorithms are the same as before. The discrimination algorithm creates a fourth fuzzy class, as expected. The representatives of the three former classes receive membership degrees similar to their neighboring classes. Finally the entropies of the four fuzzy classes, including the various integrations and rejections are: HI =0.63; /-/2=0.92; /-/3=0.88; /-/4 = 0.39. The results for the fourth fuzzy class are reported in Figure 11.

5. Conclusion We have proposed in this paper a fuzzy clustering algorithm, which can be used alone if clustering is the only aim, and a fuzzy discrimination algorithm, which

Fuzzy extended k-nearest neighbors rule

31

have the advantage of analyzing sequentially an infinite test set, without optimizing any criterion. The sensitivity of the results to variations of the parameters is an important issue presently under study. The automatic learning of those parameters is left for further research.

References

[1] T.M. Cover and P.E. Hart, Nearest neighbor pattern classification, IEEE Trans. Inform. Theory 13 (1967) 21-27. [2] P.A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach (Prentice Hall, Englewood Clip, NJ, 1982). [3] M.E. Helman, The nearest-neighbor classification rule with a reject option, IEEE Trans. Systems, Sci. Cybernet. 6 (1970) 179-185. [4] I. Tomek, A generalization of the k-NN rule, IEEE Trans. Systems, Man Cybernet. 6 (1976) 121-126. [5] S.A. Dudani, The distance-weighted k-nearest neighbor rule, IEEE Trans. Systems, Man Cybernet. 8 (1978) 325-327. [6] T.A. Brown and J. Koplowitz, The weighted nearest neighbor rule for class dependent sample sizes, IEEE Trans. Inform. Theory 25 (1979) 617-619. [7] R.D. Short and K. Fukunaga, A new nearest neighbor distance measure, in: 5th lnternat. Conf. on Pattern Recognition, Miami Beach (1980) 81-86. [8] D.L. Wilson, Asymptotic properties of nearest neighbor rulers using edited data, IEEE Trans. Systems, Man Cybernet. 2 (1972) 408-421. [9] T.J. Wagner, Convergence of the edited nearest neighbor, IEEE Trans. Inform. Theory 19 (1973) 696-697. [10] P.E. Hart, The condensed nearest neighbor rule, IEEE Trans. Inform. Theory 14 (1968) 515-516. [11] J.H. Friedman, F. Baskett and L.J. Shustek, An algorithm for finding nearest neighbors, IEEE Trans. Comput. 24 (1975) 1000-1006. [12] K. Fukunaga and P.M. Narendra, A branch and bound algorithm for computing k-nearest neighbors, IEEE Trans. Comput. 24 (1975) 750-753. [13] J. Kittler, A method for determining k-nearest neighbors, Kybernetes 7 (1978) 313-315. [14] L.A. Zadeh, Fuzzy sets, Inform. and Control 8 (1965) 338-353. [15] D. Dubois and H. Prade, Fuzzy Sets and Systems: Theory and Applications (Academic Press, New York, 1980). [16] J.C. Bezdek, Pattern Recognition with Fuzzy Objectioe Function Algorithms (Plenum Press, New York, 1981). [17] A. Kandel, Fuzzy Techniques in Pattern Recognition (Wiley, New York, 1982). [18] E.H. Ruspini, A new approach to clustering, Inform. and control 15 (1969) 22-32. [19] J.C. Dunn, A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters, J. Cybernet. 3 (1974) 32-57. [20] M. Usa'i and B. Dubuisson, An adaptative non parametric classification algorithm with an incomplete learning set, in: 7th lnternat. Conf. on Pattern Recognition, Montreal (1984) 276-278. [21] Tao Gu and B. Dubuisson, A loose-pattern process approach to clustering fuzzy data sets, IEEE Trans. Pattern Analysis Machine Intelligence (1985) 366-372. [22] A. Jozwik, A learning scheme for a fuzzy k-NN rule, Pattern Recognition Lett. 1 (1983) 287-289. [23] B.V. Dasarathy, Nosing around the neighborhood: a new system structure and classification rule for recognition in partially supervised environments, IEEE Trans. Pattern Analysis Machine Intelligence 2 (1980) 67-71.

32

M. B~reau, B. Dubuisson

[24] E. Diday et al., Optimisation en Classification Automatique (INRIA, 1979). [25] R.M. Capoc.¢lli and A. De Luca, Fuzzy sets and decision theory, Inform. and Control 23 (1973) 446-473. [26] A. De Luca and S. Termini, A definition of a nonprobalistic entropy in the setting of fuzzy sets theory, Inform. and Control 20 (1972) 301-312. [27] S.G. Loo, Measures of fuzziness, Cybernetica 20 (1977) 201-212.