Pattern Recognition Letters 1 (1983) 287-289 North-Holland
July 1983
A learning scheme for a fuzzy k-NN rule A d a m JOZWIK Institute o f Biocybernetics and Biomedical Engineering, Polish Academy o f Sciences, 00-818 Warsaw, K R N 55, Poland Received 1 February 1983 Abstract: The performance of a fuzzy k-NN rule depends on the number k and a fuzzy membership-array W[I, mR], where l and m R denote the number of classes and the number of elements in the reference set X R respectively. The proposed learning procedure consists in iterative finding such k and W which minimize the error rate estimated by the 'leaving one out' method. Key words: NN rules, learning procedure, fuzzy decisions, probability of misclassification.
1. The fuzzy k-NN rule The fuzzy k-NN rule operates as follows. Let
associated with the largest number oj, 1 <_j<_ 1, where (Vl,02 ..... Ol)=0.
= {xi}7="
be the reference set and
w= {wi}m , be a set of /-dimensional vectors, where l is the number of classes, Wi ~- (Wz 1, Wi, 2 . . . . , WZl),
Ties are broken randomly or by the single NN rule. If all wi, j, l<_i<_mR, 1--
2. The learning procedure
l j=l
wi, j = l
and
O< wi, j <_ l.
Each wi, j , l<_i<_m R, l < _ j < l , is a membershipvalue of the i-th object (represented in the ndimensional feature space by xi and identified with xi) to the class j. The set W is identified with a membership-array W[I, mR]. It is assumed that X R and W are given. For each x to be classified, the set K of indices that correspond to the k nearest neighbors of x in X R is found and the fuzzy decision-vector
The following input data are given: the training set XT (mT elements), the set X c (m c elements if X c ~ 0 ) of objects to be classified and the membership-sequence mT UT = (UT, i)i = 1'
where UT, i = j if Xi belongs to the class j. The procedure described below consists of two stages. In the first stage the following sequence is generated: (W0, k0, P0), (I411, k l , P l ) . . . . . (Wh, kh, Ph) . . . . . (1)
is assigned. If one is interested in a nonfuzzy decision then the vector x is assigned to the class
where W h, k h and Ph denote the membershiparray, the number of nearest neighbours and the expected probability of misclassification for the
0167-8655/83/$3.00 © 1983, Elsevier Science Publishers B.V. (North-Holland)
287
Volume 1, Numbers 5,6
PATTERN RECOGNITION LETTERS
k h-NN rule characterized by the array Wh and the reference set XR = XT. The array W0 is binary. It is derived directly from the sequence UT, i.e. wi, j = 1 if UT,i =j. The sequence (1) is generated by an application of the 'leaving one out' method (introduced by Lachenbruch (1965)). Wh+ 1, k h and Ph are derived from the array W h and the set XT. Each x/, i = 1 , 2 ..... m T, is simultaneously classified by the fuzzy 1, 2 . . . . . mT--2 and m T - 1 NN rules with respect to X R = X T \ {Xi}. Thus, to each xi and k, with k the number of neighbours, the fuzzy decision-vector IPk,i and, at the same time, the nonfuzzy decision uk, i are assigned. Comparing the decision-sequences mT Uk = (Uk, i)i= 1"
k = 1, 2,..., m T ,
with the membership-sequence UT, we may find the error rates q k = e k / m x , where e k denotes the number of objects x/, i = 1, 2 .... , roT, misclassified by the fuzzy k-NN rule. As k h such k is taken that offers the minimum error rate Ph = min qk. k
The vector Wh+1,i = (Okh,ikh + Wh,i)/(kh + 1)
(2)
forms a row of the array Wh÷l. The relation (2) gives the same effect as if the vector x i were classified by the fuzzy (kh+ 1)-NN rule subject to XR= X T (not X R = X T \ {xi}). The triplet (W., k., p . ) that corresponds to the smallest index h such that Ph <<-Ph+1 is treated as a result of the first stage o f learning procedure. In the second stage also a sequence of the form (1) is generated. But this time the set Wo = W , U V,, where W, is the set o f the membership-vectors that form the array W, and V, is the set of the decision-vectors obtained by classification o f all objects from the set XC=
~r.~. mc+mT
t~tJi=mT+ 1
by the fuzzy k . - N N rule with the membershiparray IV, and the reference set X R = X T. The arrays Wh+l[1,mT+mC] ,
h = 0 , 1 .....
are derived from the arrays W h respectively, exactly in the same way as they were derived in the 288
July 1983
first stage, i.e. classifying all objects from the set X T U X c by the 'leaving one out' method and applying formula (2). However, the optimum numbers kh and Ph are found by comparing the decision-sequences mT ug=(uk, i)i=l ,
k=1,2 ..... mT+m c-l,
with the membership-sequence UT. The triplet (W**, k**, p**) corresponding to the smallest index h0 such that Pho<_Pho+lis a result o f the second stage o f the learning procedure. If p , < p * * then the triplet (IV,, k , , p . ) is taken as a final result. If p , > p * * then the triplet (W**,k**,p**) is taken. This final result will be denoted by ( w f k f , pf). Similarly x f = x T if p,<_-p** and x f = x T U X c if p , > p * * . 3. The final classification The final decisions for the objects from the set X c are gathered in the set V f= V, if p.<_p** and v f = W h o + l \ W , if p . > p * * , i.e. o f e V f corresponds to x i e X c , where Who+l is the set of the fuzzy decisions derived from Who (Who= IV**) during the realization of the second stage of the learning procedure and W. consists of the first mT rows of W~o+ 1. Additional new objects (gathered in a set A X c ) can be classified by (1) the classifier characterized by W f, k f and x f ,
(2) the new classifier obtained by the realization o f the second stage o f the procedure, started with the membership-array W0 that corresponds to the set W0= wfLJ V**, where V** is a set of fuzzy decision-vectors received for the set A X c and the fuzzy kf-NN rule with respect to W f and X f , (3) the new classifier obtained by the realization of the second stage of the procedure, started with the result of the first stage, i.e. W0= W, LJV,, where 1I, is a set of the fuzzy decision-vectors found for set X c U A X c and the fuzzy k , - N N rule with respect to IV, and X R = XT.
4. Concluding remarks The efectiveness of the proposed learning pro-
Volume 1, Numbers 5,6
PATTERN RECOGNITION LETTERS
cedure has been checked experimentally on some real small-size data. The results were better t h a n for the k 0 k ' - N N rule with edited data p r o p o s e d by Koplowitz and B r o w n (1980), where k 0 is the same as in the sequence (1) and k ' is also chosen by the application o f the 'leaving one o u t ' m e t h o d . In the case when the n o n f u z z y membershipsequence ux is missing but the fuzzy membershiparray Wo is given (see Duin (1982)), error rates are f o u n d according to the following relation:
p= ~ i=1
where
½loi, j - w i , jl
mT
(3)
j=l
oi, j and Wi, j are elements o f the fuzzy
July 1983
decision-array and the fuzzy m e m b e r s h i p - a r r a y respectively. The relation (3) allows to generalize the case considered in Sections 2 and 3.
References Lachenbruch, P.A. (1965), Estimation of error rates in discriminant analysis, Ph.D. dissertation, Univ. of California, Los Angeles (Chapter 5). Koplowitz, J. and T.A. Brown (1981), On the relation of performance to editing in nearest neighbor rules, Pattern Recognition 13, 251-255 Duin, R.P.W. (1982), The use of continuous variables for labeling objects, Pattern Recognition Letters 1, 15-20.
289