Pattern Re~o(lnition Vol. 11, pp. 383 390. Pergamon Press Ltd. 1979. Printed in Great Britain. :i. Pattern Recognition Society
f)031-3203.'79,'1201-0383 $02.(KI/0
LEARNING WITH A MUTUALISTIC TEACHER K. CHIDANANDAGOWDA Department of Electrical Engineering, S.J. College of Engineering, Mysore, 570 006, India and G. KRISHNA School of Automation, Indian Institute of Science, Bangalore, 560 012, India (Received 30 August 1978; in revised form 1 February 1979)
Abstract - The concept of a "mutualistic teacher" is introduced for unsupervisedlearning of the mean vectors of the components of a mixture of multivariate normal densities,when the number of classes is also unknown. The unsupervised learning problem is formulated here as a multi-stage quasi-supervised problem incorporating a cluster approach. The mutualistic teacher creates a quasi-supervised environment at each stage by picking out "mutual pairs" of samples and assigning identical (but unknown) labels to the individuals of each mutual pair. The number of classes, if not specified,can be determined at an intermediate stage. The risk in assigning identical labels to the individuals of mutual pairs is estimated. Results of some simulation studies are presented. Unsupervised learning recognition
Parameter-estimation
Clustering
1. INTRODUCTION
Mutual nearest neighbourhood
Pattern
verts an unsupervised problem into a multi-stage quasi-supervised problem. The problem of unsupervised learning can be viewed Section 2 covers the philosophy of the proposed as the problem of estimating parameters of the com- method. Section 3 describes the proposed learning ponent densities from the unlabeled samples drawn scheme. Section 4 deals with the risk of the mutualistic from the mixture density. The Bayesian and maximum teacher scheme. Section 5 brings out the efficacy of the likelihood schemes for unsupervised learning arecom- proposed scheme with the help of some simulation putationally formidable.Ill Cooper and Coope r~21con- studies. Finally, section 6 gives an overall summary. sider multiclass mixtures with known a priori probabilities and unequal covariance matrices. The use of 2. CONCEPTSAND DEFINITIONS histograms for mixture decomposition has been suggested by Patrick and Hancock. (3) Fralick ~4~suggests Cottam et al., (~4" ts~ Hamming and Gilbert,(~6) an approximate solution using quantized parameters. Clark and Evans~17) and Callaghant18) make some Stanat 15)suggests the application of Doetsch's method allusions to the notion of mutual neighbourhood. for learning multivariate Gaussian and Bernoulli Cottam et al. use the concept of"paired neighbours" mixtures. Patrick and Fischert6) suggest a cluster (which have each other as nearest neighbours) to approach to unsupervised estimation of parameters by measure the characteristics of plant communities. mapping the samples from the mixture to a parameter Hamming and Gilbert showed that 62 ~o of the total space. Yakowitz tT~considers the possibility of learning number of individuals (trees) in a random population any identifiable mixture. Agrawala ~8) suggests a comare paired neighbours. Clark and Evans confirmed this putationally feasible learning procedure, making use relationship for both artificial and natural popuof a probabilistic labeling scheme. Rajasekaran and lations. They use the term'reflexive relationship" while Srinath, ~9) by making certain judicious approxireferring to paired neighbours. mations, have developed a pseudo-Bayes unsupervised But our real motivation for considering the concept learning scheme for the detection of nongaussian of mutual nearest neighbourhood comes from real life signals in Gaussian noise channels. Bezdek and observations. In real life, we observe that the strength Dunn t~°) suggest the application of fuzzy partitions for of the bond of friendship between two persons is a parameter estimation, function of mutual feelings rather than one-way feelChidananda Gowda and Krishna II~ ~31 have ining. Striking an analogy between friendship and troduced the concept of "mutual nearest neighclustering, we realised the importance of considering bourhood" and a new similarity measure called the mutual nearness in clustering.~3~ "mutual neighbournood value (MNV)". The concept We define the mutual neighbourhood value (MNV) of the mutual nearest neighbourhood is the basis for between any two samples of a set as the sum of the the proposed unsupervised learning scheme with a conventional nearest neighbour (NN) ranks of these "mutualistic teacher". The mutualistic teacher contwo samples, with respect to each other. 383
384
K. CHIDANANDAGOWDAand G. KRISHNA
Let X I , X 2 , . . . , X ~ be a set of NL-dimensional vectors called samples, where the X~s take values in a metric space upon which is defined a metric d. Let X~ be the ruth nearest neighbour of X~, and X~ be the nth nearest neighbour of X~. Then, the M N V between X~ and Xj is defined as (m + n). That is, MNV (Xi, Xj) = (m + n), where m, n e {0, 1 , 2 , . . . , N - 1}. Here 0 is used when i = j. Therefore,
i =j.
The first stage of the learning scheme begins with N samples of the original sample set. Each of the N samples is assigned a cluster weight of 1 to indicate N singleton clusters. Using a suitable metric d, all the mutual pairs in this sample set are found. The mutualistic teacher assigns identical (but unknown) labels to the individual samples of a mutual pair, and thus makes each mutual pair a cluster. So each mutual pair is now replaced by its sample mean. This sample mean appears as a new or representative sample, with a cluster weight of 2. The mean Xm of a mutual pair
The M N V is a semimetric, and satisfies the first two
(X~,X j) and its cluster weight n,, are found as follows :
M N V (X~, X j) e {2, 3, 4,..., 2N - 2},
i q: j = 0,
conditions of a metric:
n~X~ + njX~ X~ -
(1) (2)
M N V (Xi, Xj) > O, and M N V (X~,X~) = 0"*~Xi = Xj,
and
MNV (Xi, Xj) = M N V (Xj, Xi).
n m = n~ + n~
In a set, ifX~ is the first N N of Xj, and X~ is the first N N of X i, then the M N V between X i and Xj is 2, and X~ and Xj constitute a "mutual pair", 3. LEARNING WITH A MUTUALISTIC TEACHER (a). Notation X~ = to t = /~ = p; = 0~ =
a point in sample space, i = 1, 2,..., N. the ith class, i -- 1, 2,..., c. the actual mean of the ith class. the estimated mean of the ith class. the parameter vector of the ith class,
(b). Formulation Given a set of N unlabeled samples drawn independently from the mixture normal density function, the M a x i m u m Likelihood estimate of the mean of the ith component, ") is
P(toi/Xk, l.t~)Xk (1) , ~" P(°gJXk'#i) k=l
If the actual samples that come from the ith class are known, then P(ogJXk, #'~)is 1 for such samples and 0 for the rest, and equation (1) then becomes ni
Xk p,
k= ni X
(3)
n~ + nj
(2)
where Xk, k = 1, 2,..., n~ are the samples coming from the ith class, Equation (2) shows that the estimated mean of the ith class is the sample mean of n~ samples that are assigned identical labels. The accuracy of the estimate depends on the validity of assigning identical labels. The multi-stage quasi-supervised learning scheme is as follows :
(4)
where ni = cluster weight of X~, and nj -- cluster weight of Xj. As the two samples of each mutual pair are replaced by their sample mean, there is a reduction in the number of effective samples available for further processing. If Mt is the number of mutual pairs, then the n u m b e r of effective samples present at the end of the first stage is (N - M 0. The second stage begins with (N - M 1) samples. All the mutual pairs in this reduced set are found, and each mutual pair is replaced by its sample mean. The mean and the cluster weights are determined using equations (3) and (4). I f M 2 is the number of mutual pairs present in the second stage, then the number of effective samples prevailing at the end of the second stage is (N _ Mt _ M2). The above procedure is repeated in the subsequent stages. Actually, the mutual pairs in each stage can be arranged according to distances, between the two samples of individual pairs, in ascending order and replaced accordingly. Such ordering of the mutual pairs is not necessary in the initial stages, but it is essential in the final stages because the process of replacing the mutual pairs by their means is to be stopped when the n u m b e r of effective samples becomes equal to the n u m b e r of classes. The number of classes, if not specified, can also be determined if the expected maximum number of classes is known. When the number of effective sampies, at some stage, becomes equal to the expected maximum n u m b e r of classes, the prevailing cluster weights can be examined to identify the main clusters and hence, the number of classes. So, finally, the effective samples, when their number becomes equal to the number of classes, represent the desired mean values of the normal density functions. (c). Al#orithm The algorithm for unsupervised learning of the mean vectors can be written in a succinct form as follows:
Learning with a mutualistic teacher (1) Let N be the number of samples in the original set, and N " be the expected maximum number of classes. Begin the procedure with N effective samples (original set,, each associated with a cluster weight of unity. (2) Using the Euclidean distance measure, find all the mutual pairs in the present sample set. Arrange all such pairs according to distances, between the two samples of individual pairs, in the ascending order. (3) Consider the first mutual pair (X~, X j). Find its mean X,. and cluster weight n,. using equations (3) and (4). Delete X i and Xj from the sample set, and add X " to it. Make N = N - 1. If N = N,., go to step (6), or else go to (4). (41 Consider the next mutual pair (Xi, X~). Find its mean X,, and cluster weight n,. using equations (3j and (4j. Delete X~ and Xj from the sample set, and add X " to it. Make N = N - 1. If N = Nr,, go to step (6), or else go to step (5). (5) If the mutual pairs are all exhausted, go to step (2), otherwise go to step (4). (6) Examine the cluster weights and hence determine the number of classes c. Repeat step (4) till N = c.
X " - . X,
P(coi/X,,)---, P(coi/X),
T E A C H E R RISK
and
c
P(cl(X) = co, lab(X) = corn~X) = 1 -
~ P2(co/X)
~ 1 for co 4: ,/".
(8j
Now consider X,. as the test sample. Let X be found as its NN. Then, as before, P(cI(X,.) = co", lab(X,.) = co/X,,l c
~ P2(coi/X,.t ~ for ~o" ~ co.
= 1-
(9)
Now consider a case in which X,, is the N N of X, and X is the N N of Xm. Then, P(cI(X) = co, lab(X) = co", cl(X,.) = co', lab(X,,) = co~X, X") = P(cI(X) = co, lab(X) = co"/X) x P(cI(X,.) = co", lab(Xm) = co~X,.)
=
P2(COi/X
1 --
1 --
i=1
The calculation of the risk of the decision, made by a mutualistic teacher, that the individuals of mutual
(7)
with probability 1.
With these assumptions, equation (6) can be written as
Then, the prevailing effective samples represent the desired mean vectors. 4. T H E M U T U A L 1 S T I C
385
P2(coi/Xm)
(10)
i~1
for co 4: d " .
pairs have the same label follows the procedures adopted by Cover and Hart, (~9~ D u d a and Hart (x) and
It was possible to write equation (10) in the above form because of the following reasons. Actually, lab(X) = to" connotes that X,. is the N N
Fukunaga'(2°~ Suppose we have a set of N unlabeled samples X x, Xz .... , X~' drawn independently from the mixture density, C p(X/O) = ~ p(X/coj, Oj)P(co~). (5)
of X, and lab(X,,) = co connotes that X is the N N of X,.. But, if X,, happens to be the N N of X, it is not necessary that X should also be the N N of X,.. Hence, (cl(X) = co, lab(X) = co", X) and (cI(Xm) = CO", lab(X.,) = co, X,,) can be considered as independent. For large n, equation (10) becomes
j-1
Let cl(. ) denote the true class label of a sample, and lab(. , i n d i c a t e the class label assigned by the nearest neighbour (NN) decision rule. Consider a test sample X with true class label co (unknown). When a suitable metric d is defined, let X,, with true class label co" (unknown) be found as the N N o f X . Here the labels co, co,,E{cox,coz,...,co~}.Aclassifi_ cation error occurs if cl(X) :~ lab(X), Then, e(cl(X) = co, lab(X) = co'/X,X,.)
Let E " denote the joint event - samples X and X,, forming a mutual pair, and the occurrence of error. Then, equation (11) can be written as
P(Em/X ) =
= P(cl(X) = co, c I ( X ' ) = m"/X,X,.)
= 1 - ~ P(cojX) P(coi/X~)
P(cI(X) = co, lab(X) = ~J", cl(Xm) = (O", lab(X,,) F ~ 72 = co~X) = [1 - ~ PZ(cod'X)l (111 ~=~ for co :/: o./".
(6)
~- x if co ¢ co". Even though it is not explicitly stated hereafter, it is implied that we will be restricting ourselves to the large sample case. For the large sample case, Cover and Hart (~9) have shown that, if X " is the N N to X, then
i
1 - ~ Pz(COjX) i=1
l
•
(12)
For any sample X, if another sample X,, can be found such that it forms a mutual pair with X, then the mutualistic teacher associates the class of X with that of X,. by giving identical class labels to both X and X ' . On the other hand, if no other sample forms a mutual pair with X, then the mutualistic teacher does not associate the class of X with any other sample at that stage. We have also observed, as was done by Hamming and Gilbert, (x6~ and Clerk and EvansJ ~v~ that
386
K. CHIDANANDAGOWDAand G. KRISHNA
about 62 % of the total n u m b e r of samples in a set form mutual pairs. Let T b e the action taken by the mutualistic teacher when X is observed. Let the loss function be defined as follows :
o.5
o,4
2(T/Em) = 1 0.3,
and
(13)
,~(T/E~m) = 0 0.2
where E~, is the complement of Em. Then, using equations (12) and (13), the conditional mutualistic teacher decision risk can be written as
o~
r(T, X) = 2(T/E,.) P(EI/X ) + 2( T/E~) P(E~/X) •
=
1--
P2(~I/X
.
(14)
I
o
o.t
o.3
0.4
o.5
8ayes r i s k
i
It is desirable to express r(T,X) in terms of the conditional Bayes risk rs(X). For the two-class case, we have
rB(X) = min[P(o91/S ), P(o2/X)]
o.a
Fig. 1. Bounds on the mutualistic teacher risk. A. Upper bound on NN risk. B. Lower bound on NN risk. C. Upper bound on mutualistic teacher risk. D. Lower bound on mutualistic teacher risk.
(15)
and
These expressions are functions of rB which is a function of X. Thus, taking expectations over the c
1 - ~ P2(to,/X) = 1 - {r2(X) + [1 - rB(S)] 2} i-1
induced r a n d o m variable rs(X),
R(T) =- E[r(T, rn)] < E[C(T, rB)]. = 2rs(X) - 2r2(X).
(16)
Substituting equation (16) in (14), and suppressing the dependence on X,
r(T, re) = (2re - 2r2) 2.
(17)
The overall mutualistic teacher risk is
R(T) = iim E[r(T, rB)]
E[C(T, rB)] <_ C[ T,E(rB)] = C(T, RB)
(23)
where Re is the Bayes risk.
R(T) < C(T, RB).
(18)
By the dominated convergence theorem, ~21~equation (18) can be written as (19)
It is implied that equation (17) is true for the large sample case. Therefore, equation (19) can be written as
R(T) = E[(2r~ - 2r2)2].
As C(T, rB) is a concave function, applying Jensen's inequality,t22~
From equations (22) and (23),
"~ where E denotes expectation.
R ( T ) = E [ 2 i m r(T,r,)J.
(22)
(20)
(24)
The upper b o u n d versus Bayes error Re is plotted in Fig. 1. The solid line represents the upper bound C(T, Rn). When r(T, Rn) is not a concave function, it is shown in dotted lines in the area where it is different from C(T, RB).
Lower bound Equation (17) can be written as
r(T, rB) = (2ra -- 2r~) 2 = 4r~ + (4r~ -- 8r 3) (25) For 0 _< rB < 0.5, 4r 2 -- 8r 3 > 0.
The bounds on the mutualistic teacher risk R(T) can
Therefore, r(T, rB) > 4r~.
be obtained by the method employed by Cover and Hart.119)
As 4r~ is a convex function, applying Jensen's
Upper bound
inequality, E(4r4) -> 4R~.
For this analysis, r(T, rB) is considered as a continuous function ofrB defined on the interval [0, ½]. Let C(T, rB) be the least concave function greater than or equal to r(T, rB). Then, by definition, r(T, rB) < C( T, rB).
(21 )
(26)
(27)
Therefore, using equations (26) and (27), we have R(T) =- E Jr(T, rB)] > E(4r~) > 4R~.
(28)
Learning with a mutualistic teacher The lower bound versus Rn is plotted in Fig. 1. The mutualistic teacher makes a decision only when a mutual pair occurs, that is, only when there is a very strong evidence in favour of two samples having the same label. It does not make any decisions when the evidence is not strong enough. This is somewhat similar to the reject option associated with the k - k' nearest neighbour rule studied by Hellman. (23~ Another significant point is that, in every stage of the proposed procedure, about 62 ~o of the samples (as noted by Hamming and Gilbert, "6~ Clark and Evans~ 7~)form mutual pairs. As a result, the mutual°stic teacher is forced to make decisions in every stage. The upper and lower bounds obtained appear to be quite low. Such low bounds have also been obtained by Gimlin and Ferrell (24~ for k' = k in their k - k' error correction procedure. 5. RESULTSOF SIMULATION In order to corroborate the efficacy of the proposed learning algorithm, two simulation studies are made using an IBM 360/44 computer.
387
lq = (3.0 3.0) /~2 = (7.0, 3.0) ,/23 = (3.0, 7.0) /~4 = (7.0,
7.0).
Three hundred samples were independently generated using a Gauss°an vector generator having the above parameters. These samples, whose labels are assumed to be unknown, are shown in Fig. 2. The proposed algorithm was used on this data, assuming the expected maximum number of clusters as 9. At an intermediate stage of processing, when the number of effective samples was reduced to 9, the prevailing cluster weights were recorded. The effective samples and their cluster weights are shown in Table 1. The nine effective samples are also depicted in Fig. 3. The four main classes having 68, 74, 67 and 82 samples (cluster weights) can be easily identified from Table 1. The remaining 5 classes with 2, 3, 2, 1 and 1 samples are
Example 1
evidently insignificant. Having identified the n u m b e r of classes, in the previous stage, the simulation experiment can be stopped when the number of effective samples reduces
The first example is chosen so that there are 300 samples drawn from a mixture of 4 bivariate normal distributions of the following form:
to 4. The 4 effective samples at this stage are depicted in Fig. 4. Table 2 shows the 4 effective samples representing the desired mean vectors and their cluster weights.
4
p(X) = ~, ¼p(X//~i, Ei).
(29)
i ~ Here p(X//~i,Y~) represents the bivariate Gauss°an distribution with mean p~ and covariance matrix ~ , with the following characteristics: I0.5 y~ = 0.0 ........
0.0] 0.5 '
. ................
i = 1,2,3,4
°o .......................
Example 2
For this experiment, the data set was generated from a mixture of 2 bivariate normal distributions having the following characteristics: (i) Means: Pl = (3.0, 3.0); P2 = (7.0, 7.0), (ii) Unit variances and zero covariances, (iii) equal a priori probabilities. o°o.o°ooo°
°°o.ooo
o°°°°°.o.o.
....
°...°
......
oo°°.°o
0 o 0 00o 0 o00o O0 0 o 00o0 O0 0 o O0 o 0 0 o O0 0 0o00000
OO 0
O0
0 0 0
0
O0 000 O0 0 0 0 O0 O 0 0 0 000 0
0° 0
0 O0 o000 0
0
000
0
0
00o 0 0 0 OOO O0 0 0
0 O0 0
0
0 0 0
0
0
0
o 0 0 0 0 OO 0 0 0 0 00000 ~0 0 O000 0oo 0 0 0 o 0 O0 00000 0 O0 0 O0 0 0 0
ooooo°°°o°o°°ooo°
o°oo .....
o 000000 O0 0 0 0 0 0 0 0 O0 O0 0 0 0 0 0 0 0 0 O 000 0 0 0 O0 0 0 0 0o0 0 O0 0 0 0 0
°°oooo°o
....
0
0
0
0
o°o°o6.oooo°
Fig. 2. A set of samples drawn from a mixture density.
0 0
oo°°°°°~,o
o
° °o~°°°o
°°°°
° °oo ° o ....
o°°°o
......
Fig. 3. Nine effective samples.
°°oo°°~°
388
K. CHIDANANDAGOWDAand G. KRISHNA
Table 1. Nine effective samples and their cluster weights method. But, while implementing the probabilistic (example 1) teacher scheme, the number of classes, the a priori probabilities and the variances of the density functions SI. Representative or Cluster No. effective samples weights were assumed to be known. The estimated mean values for different N, averaged over 10 iterations, are also 1 1.954, 2.104) 2 shown in Table 3. 2 3.091, 3.098) 68 3 2.935, 7.004) 74 6. CONCLUSIONS 4 6.990, 3.002) 67 5 1.851, 3.325) 3 The unsupervised learning of the mean vectors of the 6 7.092, 7.006) 82 7 2.909, 4.501) 2 components of a mixture of normal densities, with the 8 3.877, 2.033) 1 number of classes also unknown, is formulated in this 9 8.418, 2.700) 1 paper as a multi-stage quasi-supervised problem incorporating a cluster approach. The quasi-supervised environment, created by the mutualistic teacher, aids the learning process. The upper and lower bounds on the mutualistic teacher risk are obtained. Two examples were simulated. The first example provides an overview of the entire learning procedure, highlighting the important stage of identifying the number of classes. The second example shows that the 0 0 parameter estimates with the mutualistic teacher method are comparable in quality to the estimates with the probabilistic teacher method. The probabilistic teacher scheme is computationally attractive and useful ; but it requires the knowledge of the number of classes. The mutualistic teacher method estimates the number of classes at an intermediate stage and uses this information for terminating the learning process. o The simulation studies have shown that the es0 timates with the mutualistic teacher method are quite accurate. These results are concurrent with the low values of the bounds on the mutualistic teacher risk. In the 2 examples considered, the covariances were assumed to be zero. In the case of unequal and nonzero covariances, the use of the Euclidean distance has its .................................................. own limitations, unless of course, the distributions are Fig. 4. Four effective samples, very well separated. The use of the mean position between 2 points does not significantly change the distance relationships of these 2 points with other Table 2. Four effective samples and their cluster weights points during initial stages. But during final stages, the (example 1) change in the distance relationship may become Final significant if cluster weights are not taken into SI. representative samples Cluster consideration. No. (desired mean vectors) weights o , . ° o q
o o . ,
° o ° * * o °
o ° ° o
. . . . .
•
. . . .
o o , *
. . . .
o . o e ° . o o o o °
SUMMARY
1 2 3 4
(3.017, (7.011, (2.935, (7.092,
3.104) 2.998) 7.004) 7.006)
76 68 74 82
Samples, N in number, were randomly generated from the above distribution. These samples were then assumed to comprise the unlabeled data set. The proposed learning algorithm was used on this data, assuming the characteristics of the mixture completely unknown. The mean vectors learned for different N, averaged over 10 iterations, are shown in Table 3. The same data set was also used for learning the mean vectors by Agrawala's probabilistic teacher
The concept of mutual nearest neighbourhood, which takes into account the mutual nearness or twoway nearness of two samples, is used in this paper for unsupervised learning of the mean vectors of the components of a mixture of normal densities when the number of classes is also unknown. The problem of unsupervised learning is formulated here as a multistage quasi-supervised scheme incorporating a cluster approach. The quasi-supervised environment is created by the mutualistic teacher. The concept of mutual nearest neighbourhood is invoked to define a new similarity measure called the mutual neighbourhood value (MNV). The MNV between two samples in a set is defined as the sum of
Learning with a mutualistic teacher
389
Table 3. Mean vectors by Mutualistic and Probabilistic Teacher methods (data generated using ~ = (3.0, 3.0);/~2 = (7.0, 7.0) (example 2) No. of samples used for learning 100 200 300 400
Estimated mean vectors Type of Teacher Mut. Teacher Prob. Teacher Mut. Teacher Prob. Teacher Mut. Teacher Prob. Teacher Mut. Teacher Prob. Teacher
the c o n v e n t i o n a l nearest n e i g h b o u r ( N N ) r a n k s of these two samples, with respect to each other. Let sample Xi be the ruth N N of sample X j, a n d Xj be the nth N N of X~. Then, the M N V between X i a n d X j is defined as (m + n). In a set, i f X i is the first N N of X~, andXjisthefirstNNofXi, thentheMNVbetweenXi and X j is 2 a n d these two samples are said to constitute a "'mutual pair". T h e mutualistic teacher assigns identical but u n k n o w n labels to the individuals of a mutual pair and, in effect, makes each m u t u a l pair a cluster, The heart of this learning scheme consists of determining the m u t u a l pairs in each stage, a n d replacing each m u t u a l pair by its sample mean. T h e sample m e a n appears as a new or representative sample. As the individuals of each m u t u a l pair are replaced by their sample mean, there is a reduction in the n u m b e r of effective samples from stage to stage. The n u m b e r of classes, if not specified, can be determined at an intermediate stage if the expected m a x i m u m n u m b e r of classes is known. W h e n the n u m b e r of effective samples, at some stage, becomes equal to the expected m a x i m u m n u m b e r of classes, the prevailing cluster weights can be examined to identify the main clusters a n d hence, the n u m b e r of classes. So, finally, the effective samples, when their n u m b e r becomes equal to the n u m b e r of classes, represent the desired m e a n values of the n o r m a l density functions, The upper a n d lower b o u n d s on the mutualistic teacher risk in assigning identical labels to the individuals of mutual pairs is estimated. Results of some simulation studies are presented to c o r r o b o r a t e the efficacy of the proposed learning scheme, Acknowledgement The authors wish to thank Professor K. S. Prabhu, Chairman of the Department of Electrical Engineering, Indian Institute of Science, Bangalore, for providing necessary facilities and giving encouragement from time to time. Mr. Chidananda Gowda is grateful to the Government of India for awarding the Q.I.P. Fellowship. REFERENCES 1. R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis. John Wiley, New York (1973). 2. D.B. Cooper and P. W. Cooper, Nonsupervised adaptive signal detection and pattern recognition, Inf. Control 7,
First component 13.119, (3.011, (3.076, (3.005, (3.005, (3.020, (3.042, (3.053,
3.043) 3.040) 3.044) 2.986) 3.040) 3.005) 3.090) 3.01l)
Second component (6.986, 7.046) (7.086, 7.046) (7.035, 7.036) (6.949, 6.981) (6.974, 6.921) (6.987, 7.008) (7.05l, 6.987) (6.975, 7.026)
461-444 (1964). 3. E.A. Patrick and J. C. Hancock, Nonsupervised sequential classification and recognition of patterns. IEEE Trans. Inf. Theory IT-12, 362 372 (1966). 4. S. C. Fralick, Learning to recognize patterns without a teacher, IEEE Trans. Inf. Theory IT-13, 57 64 (1967). 5. D. F. Stanat, Unsupervised learning of mixtures of probability functions, Pattern Recognition. L. KANAL, ed., pp. 357-389, Thompson Books, Washington. D.C (1968). 6. E. A. Patrick and F. P. Fischer, Cluster mapping with experimental computer graphics, Syrup. Comput. Proces.s. in Commun., Poly. Inst. Brooklyn, 8 10 April 11969). 7. S.J. Yakowitz, Unsupervised learning and identification of finite mixtures, IEEE Trans. Inf Theory IT-t6, 330 338 (1970). 8. A. K. Agrawala, Learning with a probabilistic teacher, IEEE Trans. Inf. Theory IT-t6, (4j, 373 379 (1970). 9. P. K. Rajasekharan and M. D. Srinath, Unsupervised learning in nongaussian pattern recognition. Pattern Recognition, 4, 401--416 (1972), 10. J.C. Bezdek and J. C. Dunn, Optimal fuzzy partitions: a heuristic for estimating the parameters in a mixture of normal distributions, IEEE Trans. Comput.. 835 838 11975). 11. K. Chidananda Gowda and G. Krishna, Agglomerative clustering using the concept of Mutual Nearest Neighbourhood, Pattern Recognition 10(2L 105 112 (1978). 12. K. Chidananda Gowda and G. Krishna. A cluster approach to unsupervised learning, Proc. 5th Nat. Syst. Conf., Punjab Agricultural University, Ludhiana (1978). 13. K. Chidananda Gowda and G. Krishna, Disaggregative clustering using the concept of Mutual Nearest Neighbourhood, IEEE Trans. Syst. Man Cvbernet SMC-8 (12), 888-895 11978). 14. G. Cottam, J. T. Curtis and B. W. Hall, Some sampling characteristics of a population of randomly dispersed individuals, Ecology 34, 741 (1953). 15. G. Cottam and J. T. Curtis, The use of distance measures in pbysiosociological sampling, Ecology 37, 451 (19561. 16. R. Hamming and E. Gilbert, Probability of occurrence of a constant number of isolated pairs in a random population, Univ. of Wisconsin Comput. Ne~s. 32 t1954). 17. P. J. Clark and F. C. Evans, On some aspects of spatial pattern in biological populations, Science 121, 397 398 (1955). 18. J. F. Callaghan, An alternative definition for neighbourhood of a point. IEEE Trans. Comput. C-24, 1121 (1975). 19. T. M. Cover and P. E. Hart, Nearest neighbour pattern classification, IEEE Trans. Inf. Theory. IT-13 (11, 21 27 (1967). 20. K. Fukunaga, Introduction to Statistical Pattern Recognition. Academic Press, Newt York 11972).
390
K. CHIDANANDAGOWDA and G. KRISHNA
21. M. Loeve, Probability Theory. D. Van Nostrand (1963). 22. K. L. Chung, A Course in Probability Theory. Academic, New York (1974). 23. M. E. Hellman, The nearest neighbour classification rule with a reject option, IEEE Trans. Syst. Sci. Cybernet.
SSC-6, 179-185 (1970). 24. D. R. Gimlin and D. R. Ferrell, A k - k' error correcting procedure for nonparametric imperfectly supervised learning, IEEE Trans. Syst. Man Cybernet. SMC-4 (3), 304-306 (1974).
About the Author K. CHIDANANDAGOWDA was born in Chockadi, India on 15 June 1942. He received his B.E. degree in Electrical Engineering from the University of Mysore, India, in 1964 and his M.E. degree in Electrical Engineering from the M.S. University of Baroda, Baroda, India, in 1969. From 1964 to 1966 he was with the National Institute of Engineering, Mysore and during 1969 he was with the G.S. Technological Institute, Indore, working as Lecturer in Electrical Engineering. Since 1969, he has been with the Sri Jayachamarajendra College of Engineering, Mysore, working presently as Professor of Electrical Engineering. Between 1975 and 1978, he was on study leave, working towards his Ph.D. degree in Electrical Engineering, at the Indian Institute of Science, Bangalore, India. His research interests are in the area of pattern recognition, learning, image processing and remote sensing.
About the Author - G. KRISHNA was born in Bangalore, India, in 1933. He received his B.E. degree in Electrical Engineering from the University of Mysore, Mysore City, India, in 1953, and his Ph.D. degree from the Indian Institute of Science, Bangalore, India, in 1968. Since 1953, he has been with the Indian Institute of Science where he is at present Associate Professor in the School of Automation. From 1967 to 1969, he was with the Clarkson College of Technology, Postsdam, N.Y. His research interests are in the area of power system analysis and control, optimization techniques and pattern recognition.