Energy based competitive learning

Energy based competitive learning

Neurocomputing 74 (2011) 2265–2275 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Energy...

2MB Sizes 1 Downloads 79 Views

Neurocomputing 74 (2011) 2265–2275

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Energy based competitive learning Chang-Dong Wang, Jiang-Huang Lai  School of Information Science and Technology, Sun Yat-sen University Guangzhou, PR China

a r t i c l e i n f o

abstract

Article history: Received 12 August 2010 Received in revised form 1 December 2010 Accepted 7 March 2011 Communicated by L. Xu Available online 2 April 2011

This paper addresses the three important issues associated with competitive learning clustering, which are auto-initialization, adaptation to clusters of different size and sparsity, and eliminating the disturbance caused by outliers. Although many competitive learning methods have been developed to deal with some of these problems, few of them can solve all the three problems simultaneously. In this paper, we propose a new competitive learning clustering method termed energy based competitive learning (EBCL) to simultaneously tackle these problems. Auto-initialization is achieved by extracting samples of high energy to form a core point set, whereby connected components are obtained as initial clusters. To adapt to clusters of different size and sparsity, a novel competition mechanism, namely, size-sparsity balance of clusters (SSB), is developed to select a winning prototype. For eliminating the disturbance caused by outliers, another new competition mechanism, namely, adaptive learning rate based on samples’ energy (ALR), is proposed to update the winner. Data clustering experiments on 2000 simulated datasets comprising clusters of different size and sparsity, as well as with outliers, have been performed to verify the effectiveness of the proposed method. Then we apply EBCL to automatic color image segmentation. Comparison results show that the proposed EBCL outperforms existing competitive learning algorithms. & 2011 Elsevier B.V. All rights reserved.

Keywords: Competitive learning Auto-initialization Size-sparsity Outliers Learning rate Energy

1. Introduction Competitive learning has received a significant amount of attention in the past decades [1,2]. Due to its adaptive on-line learning, it has been widely applied in the fields of data clustering [3], vector quantization [4], RBF net [5], shape detection [6,7], discrete-valued source separation [8], Markov model identification [9], component analysis [10], scheduling [11], etc. Among them, clustering analysis is still one of the most active fields. The issues such as autoinitialization, partitioning clusters of different size and sparsity, and eliminating the disturbance caused by outliers are the major topics in the literature of data clustering [12–14]. In this paper, we mainly focus on competitive learning for data clustering, and develop a new competitive learning clustering method termed energy based competitive learning (EBCL) that simultaneously has the advantages of auto-initialization, adaptation to clusters of different size and sparsity, and insensitivity to outliers.

1.1. Related work Recent development of competitive learning clustering is mainly focused on the auto-initialization including estimating the number of clusters and allocating appropriate initial prototypes.  Corresponding author. Tel.: þ 86 013168313819.

E-mail addresses: [email protected], [email protected] (C.-D. Wang), [email protected] (J.-H. Lai). 0925-2312/$ - see front matter & 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2011.03.013

Rival penalized competitive learning (RPCL) [3] proposed by Xu et al. is one of the most widely used competitive learning algorithms for automatic model selection. The basic idea is that not only the winning prototype is learned to adapt to the input, but also its rival (i.e., the second winner) is delearned by a smaller delearning rate, such that the redundant prototypes can be automatically eliminated. Ma and Wang [15] provided a mathematical theory for the convergence of RPCL from the perspective of cost-function, and proposed a distance-sensitive RPCL (DSRPCL). For controlling the strength of rival penalization, Cheung proposed the rival penalization controlled competitive learning (RPCCL) [16]. Apart from the rival penalization mechanism, other model selection strategies have recently emerged, such as self-splitting proposed by Zhang and Liu [17] and competitive repetition suppression by Bacciu and Starita [18]. Although these methods can automatically estimate the number of clusters, they are mostly proposed for clusters of the same size and sparsity and their performances degenerate considerably when this requirement is not satisfied. Additionally, the disturbance caused by outliers may also affect their performances. For adapting to the clusters of different size and sparsity, Xu further extended the original RPCL to the finite mixture modeling and multi-sets modeling, for instance, elliptic RPCL, which have been shown effective for clusters of complicated shapes [5,12]. NonGaussian structural RPCL is another effective approach of this type [19]. All these rival penalization mechanisms can be seen as special cases of the Bayesian Ying-Yang (BYY) harmony learning [2,19]. Very recently, another different model selection strategy, namely, entropy regularized likelihood (ERL), has been developed

2266

C.-D. Wang, J.-H. Lai / Neurocomputing 74 (2011) 2265–2275

by Lu and Ip for Gaussian mixture fitting based on regularization theory [20]. These methods have made great improvement to the model selection for mixture models. However, relatively little attention has been paid to eliminating the disturbance caused by outliers. And experimental results demonstrate that the existence of outliers may significantly decrease the accuracies of both model selection and cluster label assignment. In the real-world applications, auto-initialization, adapting to the clusters of different size and sparsity, and eliminating the disturbance caused by outliers are of great importance. In this paper, we propose a new competitive learning clustering method termed energy based competitive learning (EBCL) to simultaneously address the three issues. 1.2. The proposed approach The proposed method first computes the samples’ energy as the normalized sum of the weights. Then samples with high energy are extracted to form a subset termed core point set, which can be divided into several connected components by marginal points, i.e., samples with low energy. These separated components are taken as the initial clusters and their means as the initial prototypes. In this way, the number of clusters is automatically estimated and the prototypes are initiated. The initial prototypes require further refinement by competitive learning, such that not only the core point set but also the marginal samples are assigned with accurate cluster labels. Through defining the prototype energy by considering both the size and sparsity of the corresponding cluster, two basic observations on competitive learning are obtained:

 Observation one: the lower energy a prototype possesses, which



implies that the corresponding cluster is either more sparsely distributed or of smaller size, the more samples this cluster should incorporate hereafter. Observation two: the higher energy a sample possesses, the more strongly it should attract the corresponding winning prototype.

Two new competition mechanisms are developed, respectively. Based on observation one, the first mechanism is the size-sparsity balance of clusters (SSB) to select a winning prototype. As shown in Fig. 1, this mechanism leads to the second advantage of EBCL,

namely, adaptation to the clusters of different size and sparsity. From observation two, the second mechanism is the adaptive learning rate based on samples’ energy (ALR) to update the winner, such that outliers with little or no energy will no longer affect the prototypes (Fig. 1). Therefore, the third advantage, namely, insensitivity to outliers, is achieved. The remainder of the paper is organized as follows. Section 2 describes the proposed energy based competitive learning (EBCL) method. Experimental results are reported in Section 3. We conclude our paper in Section 4.

2. Energy based competitive learning 2.1. Problem formulation Given a sample dataset X ¼ fx1 , . . . ,xN g, data clustering seeks to find a finite (but not necessarily preselected) number K of clusters fC1 , . . . ,CK g such that some prespecified objective function J is minimized. For classical competitive learning (CCL), the number of clusters K should be preselected and the objective function is the distortion measure, given by the sum of the squared distances of each data point to its assigned prototype N X

JCCL ¼

Jxn wkðnÞ J2 ,

ð1Þ

n¼1

where wk is the prototype of cluster Ck , and k(n) is a function assigning a sample xn to CkðnÞ (i.e., xn A CkðnÞ Þ, computed as kðnÞ ¼ argmink A f1,...,Kg ðJxn wk J2 Þ. The objective JCCL is based on two assumptions that the prior probabilities of clusters are equal, i.e., PðCk Þ ¼ 1=K,8k ¼ 1, . . . ,K, and the covariance matrix Sk of each cluster is equally proportional to the identity matrix, i.e., Sk  s2 I [21]. Therefore, it is limited to the clusters of the same size and sparsity. To adapt to the clusters of different size and sparsity, in the proposed EBCL algorithm, we define a prototype energy weighted squared distance (PEWSD) objective function according to observation one as follows: JEBCL ¼

N X

ðEkðnÞ Jxn wkðnÞ J2 Þ,

ð2Þ

n¼1

Fig. 1. Illustration of SSB and ALR mechanisms: (a) the top-left cluster has fewer samples and is sparser, so its prototype holds lower energy than that of the bottom-right; (b) although the top-left cluster has more samples yet distributes more sparsely, so its prototype holds lower energy. In both cases, when a new sample comes, the top-left cluster is more likely to be selected as the winner according to the SSB mechanism. Note that, the new sample in (a) has higher energy than that in (b), so the winning prototype in (a) is updated with a larger step size than that in (b) according to the ALR mechanism.

C.-D. Wang, J.-H. Lai / Neurocomputing 74 (2011) 2265–2275

where E ¼ ½E1 , . . . ,EK > is prototype energy vector of prototypes fw1 , . . . ,wK g which controls the cluster label assignments according to the SSB mechanism, and the assigning function k(n) is given by the energy weighted squared distance, that is, kðnÞ ¼ argmink A f1,...,Kg ðEk Jxn wk J2 Þ.

2267

The marginal sample set M is defined to be M9X \Y ¼ fxn jen o medianfe1 , . . . ,eN gg:

ð6Þ

With the definition of core point set Y, we now construct a

e-neighborhood graph on Y and define the core-point-connectivity as follows.

2.2. The algorithm To seek the minimization of Eq. (2) with automatically estimating the number of clusters K and being insensitive to outliers, we develop the energy based competitive learning (EBCL) method as follows. 2.2.1. Calculating sample energy vector A weight matrix H ¼ ½Hij NN of the sample dataset X ¼ fx1 , . . . ,xN g is constructed first by ( expðJxi xj J2 Þ if Jxi xj J2 r e, Hij ¼ ð3Þ 0 otherwise, where e is a parameter indicating whether xi and xj are ‘‘close’’ enough to give a positive weight. In our experiments, the best value for e is the mean of squared distances between data points,1 P i.e., e ¼ ð i,j ¼ 1,...,N Jxi xj J2 Þ=N2 . Taking the mean distance as the threshold has a rational meaning that it would not generate H with too many zeros nor too many positive components. The weight matrix reflects the weights of the samples’ distance. The shorter the distance between xi and xj is, the larger the weight Hij is. Then the sample energy vector e ¼ ½e1 , . . . ,eN > is defined. Definition 1 (sample energy vector). Given a weight matrix H, the sample energy vector e ¼ ½e1 , . . . ,eN > is defined with each component en being computed as P j Hnj P , n ¼ 1, . . . ,N: ð4Þ en ¼ maxi ¼ 1,...,N j Hij Fig. 2(b) illustrates the concept of samples’ energy. Samples near the core of clusters are of higher energy. The sample energy is analogous to the density defined in [22]. However, they are different in some way. In [22], the density is measured simply by the number of samples within a e-neighborhood of one sample. ¨ et al. [23] pointed out that the density measure computed in Ertoz [22] cannot handle data containing clusters of differing densities. The density-like sample energy in Eq. (4) is, however, appropriate for identifying clusters of different sparsity. Since Definition 1 takes into account the correlations between all samples by the weight matrix H, which results in a global estimation of sample’s energy rather than locally counting the number of points within a e-neighborhood. 2.2.2. Auto-initializing the clusters and prototypes A subset Y  X comprising samples with high energy is selected according to whether a sample’s energy is higher than a threshold or not. Extensive experiments indicate that an appropriate threshold is the median of all samples’ energy, denoted as medianfe1 , . . . ,eN g. Since these high energy samples are near the cores of clusters, we define Y to be the core point set and the complementary set X \Y as the marginal sample set M. Definition 2. The core point set Y is defined as the half samples of X with high sample energy. That is Y9fxn jen Z medianfe1 , . . . ,eN gg:

ð5Þ

1 For efficiently computing the Euclidean distance matrix, a Matlab file is available at: http://web.mit.edu/cocosci/isomap/code/L2_distance.m

Definition 3 (core-point-connectivity). Two core points p and q in Y are core-point-connected w.r.t. e (denoted as ptYe qÞ if there exists a chain of core points p1 , . . . ,pm , p1 ¼ p, pm ¼ q such that pi þ 1 A N Ye ðpi Þ and pi A N Ye ðpi þ 1 Þ, where N Ye ðpi Þ is the e-neighborhood of pi restricted to the core point set Y. Since there exists no marginal sample xA M, the samples in the core point set appear in some naturally separated groups. The average within-group distances are far smaller than the average between-group distances. As can be shown in our experimental analysis, the partitioning by core-point-connectivity is not strongly influenced by the e, provided that e is not too large nor small. And in generally, a practical guideline is to select the appropriate e as the mean of squared distances between data points of core P point set Y, i.e., e ¼ ð xi ,xj A Y Jxi xj J2 Þ=jYj2 . As shown in Fig. 2(c), the core point set Y is naturally separated into K connected components fC^ 1 , . . . , C^ K g by core-point-connectivity. These connected components fC^ 1 , . . . , C^ K g are taken as initial clusters and their means as initial prototypes fw1 , . . . ,wK g, respectively, as follows. Proposition 1 (initial clusters and prototypes). The core point set Y is separated into K initial clusters fC^ 1 , . . . , C^ K g w.r.t. e, such that S 8i a j, C^ i \ C^ j ¼ |,Y ¼ Kk ¼ 1 C^ k , and 8p,q A C^ k ,ptYe q, while 8p A C^ i , Y ^ 8q A C j ,i aj, pte q does not hold. The initial prototype wk is P computed as wk ¼ ð x A C^ k xÞ=jC^ k j,8k ¼ 1, . . . ,K: In this way, EBCL achieves auto-initialization: estimating the number of clusters K and initializing their prototypes fw1 , . . . ,wK g. As shown in Fig. 2(c), the initial prototypes estimated by the proposed EBCL are much more appropriate compared with random initialization. 2.2.3. Updating the prototypes Since the initial clustering fC^ 1 , . . . , C^ K g only considers the core point set Y, it is required to update the prototypes fw1 , . . . ,wK g such that the marginal samples M can also be assigned with cluster labels. To realize the SSB mechanism, the prototype energy vector E ¼ ½E1 , . . . ,EK > of fw1 , . . . ,wK g is used, which is initialized as ½1, . . . ,1> and updated according to the SSB mechanism, as will be shown later. The prototypes fw1 , . . . ,wK g are updated by repeating the following three steps: Step 1: Selecting the winning prototype: Randomly take xn from X , and select the winning prototype wkðnÞ with the least PEWSD kðnÞ ¼ arg min ðEk Jxn wk J2 Þ: k A f1,...,Kg

ð7Þ

This winner selection rule, together with the winner’s energy updating rule in Eq. (9), forms the SSB mechanism, leading to the adaptation to the clusters of different size and sparsity, as shown in Fig. 1. Step 2: Updating the winner: Update the winner wkðnÞ by wkðnÞ ’wkðnÞ þ gðxn wkðnÞ Þ,

ð8Þ

where g denotes the adaptive learning rate based on samples’ energy, e.g., g ¼ en =10. Here, the winner is

2268

C.-D. Wang, J.-H. Lai / Neurocomputing 74 (2011) 2265–2275

25

25

20

20

15

15

Sample energy 1 0.8 0.6 10

10

5

5

0.4 0.2 0

0

0

−5

−5

−10

−10

−15 −15

−10

−5

0

5

10

15

20

25

30

25

−15 −15

−10

−5

0

5

10

15

20

25

30

−10

−5

0

5

10

15

20

25

30

25

EBCL prototype 20

20

Random prototype

15

15

10

10

5

5

0

0

−5

−5

−10

−10

−15 −15

−10

−5

0

5

10

15

20

25

30

−15 −15

Fig. 2. Original dataset, sample energy, initialization, and final clusters. (a) Five clusters are plotted in different colors, with some outliers in larger black dots. (b) Samples’ energy, each point is colored according to its sample energy. (c) EBCL achieves appropriate auto-initialization, i.e., correct cluster number and appropriate prototypes are obtained. For comparison, six random prototypes are also plotted. (d) EBCL clustering result. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

updated with a step size proportional to the new sample’s energy, as shown in Fig. 1. Step 3: Updating the winner’s prototype energy: Update the winner’s energy EkðnÞ by ! Jxn wkðnÞ J2 EkðnÞ ’EkðnÞ þexp  , ð9Þ fkðnÞ where fkðnÞ is the winning frequency of prototype wkðnÞ [24]. Let nk denote the cumulative number that wk wins, then P fkðnÞ ¼ nkðnÞ = Kk ¼ 1 nk . As mentioned in the winner selection, this winner’s energy updating rule keeps a balance between the size and sparsity of a cluster, as shown in Fig. 1.

The iteration continues until all the prototypes converge or the number of iteration t reaches the maximum iteration number tmax. In practice, the convergence of prototypes is often measuPK old 2 red by k ¼ 1 Jwk wk J o e with e being some very small positive number and wold being the old prototype before this k iteration. After the iteration stops, we classify each sample xn into cluster CkðnÞ , i.e., xn A CkðnÞ , with kðnÞ ¼ argmink A f1,...,Kg ðEk Jxn  wk J2 Þ, 8n ¼ 1, . . . ,N. Therefore, the final clusters fC1 , . . . ,CK j xn A CkðnÞ ,8n ¼ 1, . . . ,Ng are obtained. Fig. 2(d) demonstrates the clustering result on the simulated dataset of Fig. 2(a). Satisfactory clustering has been generated by the proposed EBCL method. For

C.-D. Wang, J.-H. Lai / Neurocomputing 74 (2011) 2265–2275

clarity, Algorithm 1 summarizes the proposed energy based competitive learning (EBCL) method. Algorithm 1. Energy based competitive learning. Input: X ¼ fx1 , . . . ,xN g, tmax , e. 1: Calculate sample energy vector e ¼ ½e1 , . . . ,eN > . 2: Estimate the number of clusters K and initialize prototypes fw1 , . . . ,wK g. 3: Update the prototypes fw1 , . . . ,wK g as follows. 4: Set t ¼0. 5: repeat 6: Randomly rearrange the order of fx1 , . . . ,xN g, fwold k g’fwk g, t’t þ 1. 7: for n ¼ 1, . . . ,N do 8: Compete to select a winning prototype wkðnÞ by Eq. (7). 9: Learn to update the winner wkðnÞ with learning rate g ¼ en =10 by Eq. (8). 10: Learn to update the winner’s prototype energy EkðnÞ by Eq. (9). 11: end for P 12: until K Jwk wold J2 o e or t Ztmax k¼1

k

Output: clusters fC1 , . . . ,CK : xn A CkðnÞ ,with kðnÞ by Eq. (7), 8n ¼ 1, . . . ,Ng.

2.3. Analysis of the two mechanisms 2.3.1. Size-sparsity balance In order to adapt to the clusters of different size and sparsity, we introduce a novel competition mechanism termed size-sparsity balance of clusters (SSB). Let DEkðnÞ ¼ expðJxn wkðnÞ J2 =fkðnÞ Þ denote the winner’s prototype energy increment. On one hand, DEkðnÞ monotonically increases w.r.t. fkðnÞ . It means that the less frequently a prototype wins (i.e., fewer samples have been already assigned to it), the smaller increment its energy increases by. According to Eq. (7), this prototype will more strongly incorporate the next sample, as shown in Fig. 1(a). On the other hand, DEkðnÞ monotonically decreases w.r.t. Jxn wkðnÞ J2 . It means that the larger distance between a winner and its new sample, the smaller increment its energy will increases. A remote sample will contribute almost no energy to the winning prototype. Therefore, although a cluster has many samples, yet if it is sparsely distributed, then its prototype is still of lower energy. According to Eq. (7), this prototype will again be biased towards incorporating the next sample, therefore the sparsely distributed larger clusters will not lose their samples in competition, as shown in Fig. 1(b). In summary, the SSB mechanism is suitable for dataset comprising clusters of different size and sparsity.

2.3.2. Adaptive learning rate To avoid the disturbance caused by outliers, a practical formula g ¼ en =10 for learning rate is provided. This is an empirical value according to a large number of experiments, although it is usually assumed that better clustering may be obtained by some specific schedule for changing the denominator like that used by Robbins-Monro stochastic approximation procedure [21]. Let xn denote an outlier that is far away from other data points, which indicates there exists d b 0 such that Jxn xj J2 4 d, thus Hnj -0,8j ¼ 1, . . . ,N. From the definition of sample energy, we have en -0. In the winner updating rule Eq. (8), the lower energy en (the lower g ¼ en =10Þ, the less wkðnÞ moves, so the prototypes updated in this way are insensitive to outliers.

2269

3. Experiments In this section, we perform experiments on simulated data clustering and color image segmentation to demonstrate the effectiveness of the proposed EBCL algorithm. In the simulated data clustering, we experimentally investigate the improvements achieved by the auto-initialization, SSB and ALR competition mechanisms. Five representative competitive learning algorithms are compared, including classical competitive learning (CCL), rival penalized competitive learning (RPCL) [3], rival penalization controlled competitive learning (RPCCL) [16], entropy regularized likelihood (ERL) learning [20] and a variant of RPCL considering Gaussian mixtures, namely elliptic RPCL [19]. The comparison results have validated the analysis of three advantages of EBCL. Especially, it can estimate the number of clusters more effectively than the compared methods and adapt to clusters of different size and sparsity. Additionally, it has been shown that the performance of EBCL is quite insensitive to outliers. 3.1. Simulated data clustering In the simulated data clustering, 2000 datasets were generated from the bivariate K  -component Gaussian mixtures GðK  ,NÞ. The numbers of clusters were fixed K  ¼ 3,4,5,6 with 500 datasets for each. The sample sizes were fixed at 10 different values ðN ¼ 200,400, . . . ,2000Þ with 200 datasets for each. The mean vectors and covariance matrices were randomly selected under the constraint of avoiding clusters being merged, i.e., the percentage of samples placed in the other cluster exceeds 5%. Each dataset X consisted of clusters of different size and sparsity. Additionally, some random outliers were appended. Four examples of datasets are shown in Fig. 3. We set initial cluster number at the actual number K  for CCL. Since RPCL, RPCCL, ERL and elliptic RPCL require preselecting an initial cluster number K equal to or larger than the actual number K  , we set K ¼ K  þ1 for the four methods. All the mean vectors are randomly initialized with range [  20 40] and all the covariance matrices for ERL and elliptic RPCL are initialized as the positive diagonal random matrix with the entries ranged in [0 3]. Moreover, we set appropriate learning rate a ¼ 0:05 for CCL, RPCL, RPCCL and elliptic RPCL, delearning rate b ¼ 0:002 for RPCL and elliptic RPCL, and regularization factor at 0.2 for ERL, as suggested in [3,16,20]. For all methods, the max iteration number tmax and iteration stopping criteria e were fixed at 100 and 10  5, respectively. Two evaluation indices were used, including average cluster number accuracy (ACNA) and average sample assignment accuracy (ASAA), which, respectively, evaluate the performance on estimating the number of clusters and assigning samples to clusters. Let L denote the number of datasets, i.e., L¼2000. For each dataset X l ¼ GðKl ,Nl Þ,l ¼ 1, . . . ,L, we denote Nl as the sample number, xnl as the nl-th sample of X l , nl ¼ 1, . . . ,Nl , Kl and Kl as the actual and estimated number of clusters, respectively, k ðnl Þ and kðnl Þ as the class label and resulting cluster label of xnl , respectively. ACNA and ASAA are computed, respectively, as follows: PL PNl PL  dðKl ,Kl Þ l¼1 nl ¼ 1 dðk ðnl Þ,kðnl ÞÞ , ASAA ¼ ACNA ¼ l ¼ 1 , PL L l ¼ 1 Nl ð10Þ where dðx,yÞ ¼ 1 if x¼y and dðx,yÞ ¼ 0 otherwise. Note that the cluster label is rearranged to associate with the true class label which accounts for the largest number of data points in the cluster. The higher score of ACNA indicates that an algorithm can more effectively estimate the number of clusters, meanwhile higher score of ASAA indicates that an algorithm can more accurately assign the cluster label.

2270

C.-D. Wang, J.-H. Lai / Neurocomputing 74 (2011) 2265–2275

20

20

15

15

10

10

5

5

0

0

−5

−5

−10

−10 −10

−5

0

5

10

15

20

−10

20

20

15

15

10

10

−5

0

5

10

15

20

15

20

5

5

0

0

−5 −5 −10 −10 −10

−5

0

5

10

15

20

−10

−5

0

5

10 

Fig. 3. Four examples of sample datasets. Different clusters are marked in distinct colors, with outliers surrounded in larger black dots: (a) X ¼ GðK ¼ 3,N ¼ 600Þ; (b) X ¼ GðK  ¼ 4,N ¼ 1000Þ; (c) X ¼ GðK  ¼ 5,N ¼ 1400Þ; and (d) X ¼ GðK  ¼ 6,N ¼ 1800Þ. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

3.1.1. Experimental analysis on EBCL We first investigate the influence of different e in core-pointconnectivity. Let m denote the mean of squared distances between P data points of core point set Y, i.e., m ¼ ð xi ,xj A Y Jxi xj J2 Þ=jYj2 , Fig. 4 plots the performances of ACNA and ASAA as a function of e value. From the figure, we can see that when the e is selected from ½0:7m 1:3m, the performances of cluster number estimation (ACNA) and sample assignment (ASAA) remain relatively unchanged. It implies that when e is not too large nor small, its value would not significantly influence the performance and a practical guideline is to set e ¼ m, as used in our comparison experiments. Then we examine, respectively, the effectiveness of the two mechanisms, namely SSB and ALR. For investigating the effectiveness of SSB mechanism, we fix the prototype energy vector E ¼ ½E1 , . . . ,EK  ¼ ½1, . . . ,1, and call this variant of EBCL as EBCL without SSB. Please note that, in this variant, the winner selection rule is equal to that used in CCL, i.e., selecting the winning

prototype with smallest squared distance. Similarly, for investigating the effectiveness of ALR mechanism, we fix the learning rate g ¼ 0:05, and call this variant of EBCL as EBCL without ALR. It should also be noted that, the winner learning rule is equal to that used in CCL, RPCL and RPCCL. Furthermore, we also investigate the performance of EBCL when both SSB and ALR mechanisms are removed, and call it as EBCL without SSB&ALR. Table 1 lists the ASAA with and without two mechanisms. When removing the ALR mechanism, the ASAA decreases by 0.034. However, when removing SSB mechanism, the ASAA decrease by 0.117, which is a relatively large drop compared with when removing the ALR mechanism. And when both SSB and ALR mechanisms are removed, the performance drops to 0.811, only sightly higher than 0.790 obtained by CCL. Therefore, the conclusion can be made that both SSB and ALR mechanisms are very important in improving the accuracy of classification, and SSB is more influential than ALR.

C.-D. Wang, J.-H. Lai / Neurocomputing 74 (2011) 2265–2275

1

0.98

0.99

0.97

2271

0.98 0.96 0.96

ASAA

ACNA

0.97

0.95 0.94

0.95 0.94 0.93

0.93 0.92 0.92 0.91

0.91 0.9 0.65μ 0.7μ 0.75μ 0.8μ 0.85μ 0.9μ 0.95μ 1μ 1.05μ 1.1μ 1.15μ 1.2μ 1.25μ 1.3μ 1.35μ

ε

0.9 0.65μ 0.7μ 0.75μ 0.8μ 0.85μ 0.9μ 0.95μ 1μ 1.05μ 1.1μ 1.15μ 1.2μ 1.25μ 1.3μ 1.35μ

ε

Fig. 4. The performances of cluster number estimation and the sample assignments on different e. Here, we use m to denote the mean of squared distances between data P points of core point set Y, i.e., m ¼ ð xi ,xj A Y Jxi xj J2 Þ=jYj2 : (a) ACNA on e and (b) ASAA on e.

Table 1 The ASAA values with and without two mechanisms. Two mechanisms

With SSB

Without SSB

With ALR Without ALR

0.967 0.933

0.850 0.811

3.1.2. Comparing the performance on cluster number estimation This section compares the performances of RPCL, RPCCL, ERL, elliptic RPCL and EBCL on estimating the actual number of clusters. Since the CCL does not have the ability to estimate the number of clusters, there is no result for CCL in this comparison. Fig. 5 (a) plots the value of ACNA as a function of the actual number of clusters. From the figure, the accuracies of all methods on estimating the number of clusters trend to decrease slightly when the actual number of clusters increases. However, the proposed EBCL can still most effectively estimate the number of clusters, obtaining at least 0.95 ACNA, which is 0.03 higher than the second best elliptic RPCL. Similarly, when the ratio of outliers increases from 0.00 to 0.10, the performances on estimating the number of clusters trend to decrease, as shown in Fig. 5(b). Especially for the compared counterparts, due to the lack of strategy for handling outliers, the ACNA decreases much faster than the proposed EBCL. Meanwhile EBCL can still stably estimate the number of clusters when the outlier ratio reaches 0.10. The main reason is that by removing marginal points of lower sample energy, the outliers of little energy are almost eliminated. Additionally, the datasets allow the clusters of different size and sparsity. As a consequence, conclusion can be safely made that the proposed EBCL can more effectively estimate the number of clusters in the situations that clusters are of different size and sparsity, as well as that there exist outliers.

3.1.3. Comparing the performance on assigning cluster label We compare the performances of CCL, RPCL, RPCCL, ERL, elliptic RPCL and the proposed EBCL on assigning cluster label. Fig. 6(a) displays the ASAA value as a function of the actual number of clusters. From the figure, it can be seen that the performance of CCL in terms of ASAA is the lowest in all cases; meanwhile ERL slightly outperforms RPCL and RPCCL, but not comparable with elliptic RPCL. The ERL and elliptic RPCL are essentially Gaussian mixture models, which allow the clusters of different size and sparsity. However, EBCL has outperformed the second best. Since EBCL not only has the capability of adapting to clusters of different size and sparsity, but also can eliminate the

disturbance caused by outliers. The comparison results on outliers shown in Fig. 6(b) also support the discussion above. We see from the figure that EBCL generates the relatively stable ASAA as the ratio of outliers increases, therefore it is quite insensitive to outliers. In summary, the experimental results have validated the effectiveness of EBCL in adapting to clusters of different size and sparsity and being insensitive to outliers.

3.1.4. Further comparison As a further comparison, Fig. 7(a) plots the ACNA obtained by the five methods. Since CCL has no ability to estimate the number of clusters, its ACNA is therefore omitted in Fig. 7(a). We can see that, EBCL achieves the best performance among the compared algorithms in terms of ACNA. That is, it can most correctly (near 100%) estimate the number of clusters (highest ACNA). The second best algorithm is the elliptic RPCL. As discussed by Xu, elliptic RPCL is the best RPCL variant in estimating the cluster number, and our experimental results here also demonstrate this fact. Due to the presence of the outliers, the propose EBCL slightly outperforms elliptic RPCL. The main reason is that in some cases, the elliptic RPCL could not remove the redundant cluster due to the disturbance caused by outliers. Fig. 7(b) plots the ASAA obtained by the six methods as well as those by three variants of EBCL, namely, EBCL without SSB&ALR, EBCL without SSB and EBCL without ALR. As can be seen from the figure, the performance of CCL on sample assignment is the lowest. When comparing the EBCL without SSB&ALR and CCL, EBCL without SSB&ALR sightly outperforms CCL, although both of them have not considered the size-sparsity nor outliers. Since in some cases, CCL would suffer from the prototype under-utilization problem, leading to degenerate results; meanwhile the EBCL without SSB&ALR would not, due to the appropriate auto-initialization. When comparing EBCL without ALR with elliptic RPCL, it can be seen that they obtain the similar ASAA values, which are sightly higher than ERL. However, the ASAA value obtained by EBCL is the highest, sightly higher than the second best elliptic RPCL, and much higher than ERL and others. The underlying reason is that, although EBCL, elliptic RPCL and ERL all consider the size-sparsity, EBCL also considers avoiding the disturbance caused by outliers. Fig. 7(c) plots the average computational time in seconds (Matlab code on a Core(TM)2 Duo 2.4 GHz computer with 1.5 GB memory), which indicates that EBCL is at least three times faster than the other five methods. The main reason is that, with appropriate auto-initialization, EBCL requires less iterations to reach convergence.

C.-D. Wang, J.-H. Lai / Neurocomputing 74 (2011) 2265–2275

1

1

0.95

0.95

0.9

0.9

0.85

0.85 ACNA

ACNA

2272

0.8 0.75

0.8 0.75

EBCL Elliptic RPCL ERL RPCCL RPCL

0.7 0.65

EBCL Elliptic RPCL ERL RPCCL RPCL

0.7 0.65

0.6

0.6 3

4

5

6

0

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

Actual number of clusters

0.1

Outlier ratio

1

0.95

0.95

0.9

0.9

0.85

0.85 ASAA

1

0.8 0.75 0.7 0.65

EBCL Elliptic RPCL ERL RPCCL RPCL CCL

0.7 0.65 0.6

3

4

5

6

0

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

Actual number of clusters

Outlier ratio

Fig. 6. Comparing the performances on assigning cluster label: (a) ASAA on actual number of clusters and (b) ASAA on outlier ratio.

1

1

0.8

0.8

35

EBCL Elliptic RPCL ERL RPCCL RPCL CCL

0.6

0.6

0.4

0.2

0

0.4 EBCL Elliptic RPCL ERL RPCCL RPCL Methods

0.2

0

EBCL EBCL without SSB&ALR EBCL without ALR EBCL without SSB Elliptic RPCL ERL RPCCL RPCL CCL Methods

Average time in seconds

30

ASAA

0.6

0.8 0.75

EBCL Elliptic RPCL ERL RPCCL RPCL CCL

ACNA

ASAA

Fig. 5. Comparing the performances on estimating the number of clusters: (a) ACNA on actual number of clusters and (b) ACNA on outlier ratio.

25 20 15 10 5 0

Methods

Fig. 7. Further comparison on 2000 simulated datasets: (a) ACNA; (b) ASAA; and (c) average time.

0.1

C.-D. Wang, J.-H. Lai / Neurocomputing 74 (2011) 2265–2275

2273

Fig. 8. Applying EBCL to segmenting color images from BSDS [25]. The first row lists some original images. The second row displays the sample energy of each pixel in gray scale: The brighter a pixel, the higher energy it possesses. The third row plots in white the pixels of high energy. The fourth row displays the final segmentation result by EBCL, with different segments being colored in different scales. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

3.2. Automatic color image segmentation In this section, we apply the proposed EBCL algorithm to the automatic color image segmentation on images from the Berkeley Segmentation Dataset (BSDS) [25]. The widely used Normalized Probabilistic Rand (NPR) index [26] is used as the evaluation criterion to compare EBCL with CCL, RPCL [3], RPCCL [16], ERL [20] and elliptic RPCL. The segmentation results reveal that the proposed method can correctly estimate the number of interesting objects in each color image and effectively segment the boundaries of the objects. The Berkeley Segmentation Dataset contains 300 images of a wide variety of natural scenes, as well as the ‘‘ground truth’’ segmentations produced by humans [27]. The goal is to provide an empirical basis for research on image segmentation and boundary detection. Fig. 8 shows four original images and some related results generated by EBCL. Since the ‘‘ground truth’’ segmentations are available, the widely used normalized probabilistic rand (NPR) index [26] was computed and used as the evaluation criterion. Higher NPR indicates better segmentation results. We used the 3-D vectors of color features of each pixel as the feature vectors. Since the Lab color is designed to approximate human vision and suitable for interpreting the real world, the coordinates in the Lab color space were used as the features [16,20]. Before applying clustering algorithms, smoothing operation had been done first to avoid the over-segmentation caused by local color variants. According to the statistics on the ‘‘ground truth’’ images provided by [27], there are less than 10 major objects in each image. So the initial cluster numbers for RPCL, RPCCL, ERL and elliptic RPCL were preselected as 10 like in [20], and set at 8 for CCL, which were proved to be optimal by a large number of experiments. Additionally, we set learning rate a ¼ 0:05, delearning rate b ¼ 0:002 and regularization factor at 0.4, respectively, as suggested in [3,16,20]. For all methods, the max iteration number tmax and iteration stopping criteria e were fixed at 100 and 10  5, respectively. The means and the standard deviations of NPR indices and average computational time on 300 images are listed in Table 2. It

Table 2 Mean and std. dev. of NPR, and average time on 300 images from BSDS. Algorithm

CCL

RPCL

RPCCL

ERL

Elliptic RPCL

EBCL

Mean of NPR Std. dev. of NPR Average time (s)

0.652 0.175 80.9

0.681 0.182 89.3

0.670 0.173 90.6

0.733 0.140 88.4

0.734 0.168 84.7

0.737 0.132 53.2

shows that EBCL gets the highest NPR index among the compared methods with the least computational time. And the proposed EBCL obtains the smallest standard deviation of NPR, that is, it is the stablest among the compared methods in color image segmentation. Furthermore, some original images, segmentation results obtained by five algorithms and ‘‘ground truth’’ segmentations are shown in Fig. 9. From the figure, we see that the proposed EBCL can best estimate the number of interesting objects in each image and accurately segment the boundaries of each object. The objects are usually of different size and their color distributions may not be of the same density. Additionally, the color pixels near the boundaries can be taken as outliers since they do not belong to any object. Therefore, in the application of color image segmentation, it is often the case that objects are of different size and sparsity, and outliers may exist. However, the proposed method can effectively overcome these problems due to its advantages of adaptation to the clusters of different size and sparsity, as well as insensitivity to outliers. The comparison results demonstrate the effectiveness of EBCL in the application of real-world color image segmentation.

4. Conclusions In this paper, we have proposed a novel competitive learning clustering method termed energy based competitive learning (EBCL), which simultaneously has three advantages of autoinitialization, adaptation to the clusters of different size and sparsity, and eliminating the disturbance caused by outliers.

2274

C.-D. Wang, J.-H. Lai / Neurocomputing 74 (2011) 2265–2275

Original

CCL

RPCL

RPCCL

ERL

Elliptic RPCL

EBCL

Ground truth

#113044

#176035

#353013

Fig. 9. Some original images, segmentation results obtained by six algorithms and ‘‘ground truth’’ segmentations. From top to bottom: original images [25], CCL results, RPCL results, RPCCL results, ERL results, Elliptic RPCL results, EBCL results, and ‘‘ground truth’’ segmentations [27]. The image IDs are also plotted as ‘‘#ID’’ below the last row. Each segment is painted with its mean color, and different segments of the ‘‘ground truth’’ are painted with different grayscales. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Auto-initialization has been achieved by extracting samples of high energy to form a core point set, whereby connected components are obtained as initial clusters. To adapt to the clusters of different size and sparsity, a novel competition mechanism, namely, size-sparsity balance of clusters (SSB), has been developed to select a winning prototype. For eliminating the disturbance caused by outliers, another new competition mechanism, namely, adaptive learning rate based on samples energy (ALR), has been proposed to update the winner. Comprehensive experiments on

both simulated data clustering and color image segmentation have been performed to demonstrate the effectiveness of the proposed EBCL method.

Acknowledgments This work was supported by the NSF-Guangdong (U0835005) and the NSFC (60803083). The authors would like to thank the

C.-D. Wang, J.-H. Lai / Neurocomputing 74 (2011) 2265–2275

associate editor and reviewers for their comments which are very helpful in improving the revision. References [1] D.E. Rumelhart, D. Zipser, Feature discovery by competitive learning, Cognitive Science 9 (1985) 75–112. [2] L. Xu, A unified perspective and new results on RHT computing, mixture based learning, and multi-learner based problem solving, Pattern Recognition 40 (2007) 2129–2153. [3] L. Xu, A. Krzyzak, E. Oja, Rival penalized competitive learning for clustering analysis, RBF net, and curve detection, IEEE Transactions on Neural Networks 4 (1993) 636–649. [4] W.-J. Hwang, B.-Y. Ye, S.-C. Liao, A novel entropy-constrained competitive learning algorithm for vector quantization, Neurocomputing 25 (1999) 133–147. [5] L. Xu, RBF nets, mixture experts, and Bayesian Ying-Yang learning, Neurocomputing 19 (1998) 223–257. [6] Z.-Y. Liu, K.-C. Chiu, L. Xu, Strip line detection and thinning by RPCL-based local PCA, Pattern Recognition Letters 24 (2003) 2335–2344. [7] Z.-Y. Liu, H. Qiao, L. Xu, Multisets mixture learning-based ellipse detection, Pattern Recognition 39 (2006) 731–735. [8] Y.-M. Cheung, L. Xu, Rival penalized competitive learning based approach for discrete-valued source separation, International Journal on Neural Systems 10 (2000) 483–490. [9] Y.-M. Cheung, L. Xu, An RPCL-based approach for Markov model identification with unknown state number, IEEE Signal Processing Letters 7 (2000) 284–287. [10] Z.-Y. Liu, L. Xu, Topological local principal component analysis, Neurocomputing 55 (2003) 739–745. [11] R.-M. Chen, Y.-M. Huang, Competitive neural network to solve scheduling problems, Neurocomputing 37 (2001) 177–196. [12] L. Xu, Rival penalized competitive learning, finite mixture, and multisets clustering, in: Proceedings of the International Joint Conference on Neural Networks (IJCNN 1998), 1998. [13] J. Ma, T. Wang, L. Xu, A gradient BYY harmony learning rule on Gaussian mixture with automated model selection, Neurocomputing 56 (2004) 481–487. [14] J. Ma, L. Xu, Asymptotic convergence properties of the EM algorithm with respect to the overlap in the mixture, Neurocomputing 68 (2005) 105–129. [15] J. Ma, T. Wang, A cost-function approach to rival penalized competitive learning (RPCL), Cybernetics Part B, Cybernetics 36 (2006) 722–737. [16] Y.-M. Cheung, On rival penalization controlled competitive learning for clustering with automatic cluster number selection, IEEE Transactions on Knowledge Data Engineering 17 (2005) 1583–1588. [17] Y.-J. Zhang, Z.-Q. Liu, Self-splitting competitive learning: a new on-line clustering paradigm, IEEE Transactions on Neural Networks 13 (2002) 369–380. [18] D. Bacciu, A. Starita, Competitive repetition suppression clustering: a biologically inspired learning model with application to robust clustering, IEEE Transactions on Neural Networks 19 (2008) 1922–1941. [19] L. Xu, BYY harmony learning, structural RPCL, and topological self-organizing on mixture models, Neural Networks 15 (2002) 1125–1151. [20] Z. Lu, H. Ip, Generalized competitive learning of Gaussian mixture models, Cybernetics Part B, Cybernetics 39 (2009) 901–909. [21] C.M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006. [22] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, in: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), 1996.

2275

¨ M. Steinbach, V. Kumar, Finding clusters of different sizes, shapes, [23] L. Ertoz, and densities in noisy, high dimensional data, in: SIAM International Conference on Data Mining (SDM 2003), 2003. [24] D. DeSieno, Adding a conscience to competitive learning, in: IEEE International Conference on Neural Network (ICNN 1988), 1988, pp. 117–124. [25] D. Martin, C. Fowlkes, D. Tal, J. Malik, A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics, IEEE Conference on ICCV, vol. 2, 2001, pp. 416–423. [26] R. Unnikrishnan, C. Pantofaru, M. Hebert, Toward objective evaluation of image segmentation algorithms, IEEE Transactions Pattern on Analysis and Machine Intelligence 29 (2007) 929–944. [27] J.M.C. Fowlkes, D. Martin, Local figure-ground cues are valid for natural images, Journal of Vision 7 (2007) 1–9.

Chang-Dong Wang received the B.S. degree in Applied Mathematics in 2008 and M.Sc. degree in Computer Science in 2010 from Sun Yat-sen University, Guangzhou, P. R. China. He started the pursuit of the Ph.D. degree with Sun Yat-sen University in September 2010. His current research interests include machine learning, pattern recognition and computer vision, especially focusing on data clustering and its applications in computer vision. He is selected for the IEEE TCII Student Travel Award and gives a 20 min oral presentation in the 10th IEEE International Conference on Data Mining, December 14–17, 2010, Sydney, Australia. His ICDM paper titled ‘‘A Conscience On-line Learning Approach for Kernel-Based Clustering’’ has been selected as one of the best research papers and invited to be extended for publication in the international journal Knowledge and Information Systems.

Jian-Huang Lai received his M.Sc. degree in Applied Mathematics in 1989 and his Ph.D. in Mathematics in 1999 from SUN Yat-sen University, China. He joined Sun Yat-sen University in 1989 as an Assistant Professor, where currently, he is a Professor with the Department of Automation of School of Information Science and Technology and vice dean of School of Information Science and Technology. Dr. Lai had successfully organized the International Conference on Advances in Biometric Personal Authentication’2004, which was also the Fifth Chinese Conference on Biometric Recognition (Sinobiometrics’04), Guangzhou, in December 2004. He has taken charge of more than five research projects, including NSF-Guangdong (no. U0835005), NSFC (no. 60144001, 60373082, 60675016), the Key (Keygrant) Project of Chinese Ministry of Education (no. 105134), and NSF of Guangdong, China (no. 021766, 06023194). He has published over 80 scientific papers in the international journals and conferences on image processing and pattern recognition. His current research interests are in the areas of digital image processing, pattern recognition, multimedia communication, wavelet and its applications. Prof. Lai serves as a standing member of the Image and Graphics Association of China and also serves as a standing Director of the Image and Graphics Association of Guangdong.