Expert Systems with Applications 38 (2011) 9036–9040
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
Short Communication
Image segmentation using PSO and PCM with Mahalanobis distance Yong Zhang a,b,⇑, Dan Huang a, Min Ji a, Fuding Xie c a
College of Computer and Information Technology, Liaoning Normal University, Dalian 116081, China College of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China c College of Urban and Environmental Sciences, Liaoning Normal University, Dalian 116029, China b
a r t i c l e
i n f o
Keywords: Particle swarm optimization Possibilistic c-means Mahalanobis distance Image segmentation Clustering
a b s t r a c t Fuzzy clustering algorithm is widely used in image segmentation. Possibilistic c-means algorithm overcomes the relative membership problem of fuzzy c-means algorithm, and has been shown to have satisfied the ability of handling noises and outliers. This paper replaces Euclidean distance with Mahalanobis distance in the possibilistic c-means clustering algorithm, and optimizes the initial clustering centers using particle swarm optimization method. Experimental results show that the proposed algorithm has a significant improvement on the effect and efficiency of segmentation comparing with the standard FCM clustering algorithm. Ó 2011 Elsevier Ltd. All rights reserved.
1. Introduction Image segmentation is an important technology for image processing, and also is a fundamental process in many image, video, and computer vision applications. The goal of image segmentation is to cluster pixels into salient image regions, such as regions corresponding to individual surfaces, objects, or natural parts of objects. Most computer vision and image analysis problems require a segmentation stage in order to detect objects or divide the image into regions, which can be considered homogeneous according to a given criterion, such as color, motion, texture, etc. Using these criteria, image segmentation can be used in several applications including video surveillance, medical imaging analysis, image retrieval and object classification (Luis, Eli, & Sreenath, 2009). Cluster analysis is the process of classifying objects into subsets. Clustering has a place of honor in many engineering fields such as pattern recognition, image processing, system modeling, data mining, and so on. Bezdek (1981) proposed fuzzy c-means (FCM) clustering algorithm based on fuzzy theories. Fuzzy clustering technology is widely used in image segmentation for its simplicity and applicability. Furthermore, on the basis of FCM, much more optimized clustering methods have been proposed. Wu and Yang (2002) have proposed an alternative fuzzy c-means clustering algorithm Xing and Hu (2008) have proposed an adaptive FCM-based mixtures of expert model to improve the unlabeled data classification. Kang, Min, and Luan (2009) have proposed the improved fuzzy
⇑ Corresponding author. at: College of Computer and Information Technology, Liaoning Normal University, No.1, Liushu South Street, Ganjingzi District, Dalian, Liaoning Province 116081, China. E-mail address:
[email protected] (Y. Zhang). 0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.01.041
c-means algorithm based on adaptive weighted average to solve noise samples. However, FCM-type algorithms share the problem of sensibility to noise and outliers as do all the least squares approaches. In addition, due to the constraints in FCM, the membership is interpreted as degrees of sharing, but not as degrees of possibility of a point belonging to a class. A degree of typicality or possibility of belonging may be better suited for classical fuzzy set theory. Possibilistic c-means (PCM) algorithm, proposed by Krishnapuram and Keller (1993), overcomes the relative membership problem of fuzzy c-means (FCM) algorithm and has been shown to have satisfied the ability of handling noises and outliers. Moreover, many algorithms are based on Euclidean distance which can only be used to detect spherical structural clusters. So, accuracy dealing with high dimensional data is not fine. To improve these problems, this paper replaces Euclidean distance with Mahalanobis distance in the PCM algorithm. Usually, fuzzy clustering algorithm gave better results only when the initial partitions were close to the final solution. In other words, the results of fuzzy clustering depend highly on the initial state and reach to local optimal solution. In order to overcome this problem, a lot of studies have been done in clustering. For instance, Mualik and Bandyopadhyay (2000) have proposed a genetic algorithm based method to solve the clustering problem and experiment on synthetic and real life datasets to evaluate the performance. Ng and Wong (2002) have proposed a tabu search based clustering algorithm to find a global solution of the fuzzy clustering problem. Niknam, Amiri, Olamaie, and Arefi (2009) have presented a hybrid evolutionary algorithm based on PSO and SA to find optimal cluster centers, and also have presented a cluster analysis optimization algorithm based on the combination of PSO, ACO and k-means (Niknam & Amiri, 2010).
9037
Y. Zhang et al. / Expert Systems with Applications 38 (2011) 9036–9040
Particle swarm optimization (PSO), proposed by Kennedy and Eberhart (1995), has been successfully applied to various optimization problems. This new optimization algorithm combines social psychology principles in socio-cognition human agents and evolutionary computations. The PSO algorithm is motivated by the behavior of organisms. This algorithm begins with generating an initial population of random solutions. Each individual, also called a particle, is assigned with a randomized velocity according to its own and companions’ flying experiences, and the individuals are flown through hyperspace. The PSO algorithm can produce high-quality solutions within shorter calculation time and more stable convergence characteristics than other stochastic methods. Due to the good features of PSO algorithm, nowadays it has been emerged as a new and attractive optimization tool and successfully applied in variety of different fields (El-Zonkoly, 2006; Chen & Zhao, 2009; Zhao, 2010). This paper replaces Euclidean distance with Mahalanobis distance in the possibilistic c-means algorithm, and optimizes the initial clustering centers using particle swarm optimization method. Experimental results in image segmentation show the proposed algorithm is effective and advantageous. The remainder of this paper is organized as follows. Section 2 briefly introduces the related work. We propose image segmentation method based on particle swarm optimization and PCM method with Mahalanobis distance in Section 3. The experimental results are reported in Section 4. Section 5 concludes this paper. 2. Related works
xi;d ðtÞ ¼ xi;d ðt 1Þ þ v i;d ðtÞ
2.2. Possibilistic c-means clustering Clustering plays an important role in data analysis and interpretation, and also is an important branch of unsupervised clustering system in statistical pattern recognition. It is based on partitioning of a collection of data points into a set of different clusters where objects inside the same cluster show some considerable similarity. Fuzzy clustering technique based on fuzzy set theory, is widely used in pattern recognition and data mining and other fields. The most widely used prototype-based clustering method for data partition is probably the ISODATA or FCM algorithm (Bezdek, 1981). Given a set of ldata patterns X = {x1, x2, . . . , xl}, the algorithm minimizes a weighted within group sum of squared error objective function J(U, V). l X c X j¼1
Inspired by social behavior in nature, PSO is a population-based search algorithm that is initialized with a population of random solutions, called particles. Each particle in the PSO flies through the search space at a velocity that is dynamically adjusted according to its own and its companion’s historical behavior. Particle swarm optimization is an evolutionary computation technique. Similar to genetic algorithms, PSO is a population-based optimization tool. It is inspired by social behavior among individuals. Particles (individuals) representing a potential problem solution move through an n-dimensional search space. Each particle i maintains a record of the position of its previous best performance in a vector called pbest. The nbest, is another ‘‘best’’ value that is tracked by the particle swarm optimizer. This is the best value obtained so far by any particle in that particle’s neighborhood. When a particle takes the entire population as its topological neighbors, the best value is a global best and is called gbest. All particles can share information about the search space. Representing a possible solution to the optimization problem, each particle moves in the direction of its best solution and the global best position discovered by any particles in the swarm. Each particle calculates its own velocity and updates its position in each iteration. Let pbesti,d denote the best previous position encountered by the ith particle. pbestg,d denotes the global best position thus far, and t denotes the iteration counter. The current velocity of the dth dimension of the ith particle at time t is Shi and Eberhart (1998)
vi ¼
Pl
where the user-defined behavioral parameter w is called the inertia weight and controls the amount of recurrence in the particle’s velocity. Stochastic variables rand1( ) and rand2( ) are random numbers between 0 and 1. Positive constant c1 and c2 are the learning factors of the stochastic acceleration terms and determine the impact of the personal best and the global best, respectively. Adding
l X c X
i¼1
j¼1 ðuij Þ
Pl
m
j¼1 ðuij Þ
j¼1
2 um ij kxj v i k
ð3Þ
i¼1
xj
m
;
uij ¼
1 2 Pc dij m1
ð4Þ
k¼1 dik
where the uij satisfies 8j 2 f1; . . . ; lg :
c X
uij ¼ 1;
8i 2 f1; . . . ; cg;
j 2 f1; . . . ; lg : uij 2 ½0; 1
ð5Þ
i¼1
Possibilistic c-means (PCM) algorithm, which was first proposed in Krishnapuram and Keller (1993) and further explored in Krishnapuram and Keller (1996) to overcome the relative membership problem of the fuzzy c-means algorithm, has been shown to have satisfied ability of handling noises and outliers. Its basic idea is to relax the objective function (3) by dropping the sum to 1 from the constraint (5). In order to avoid a trivial solution of uij = 0 for all i, a penalty term is added which forces uij to be as large as possible. This was done by modifying the objective function in (3) as follows: l X c X j¼1
ð1Þ
2
um ij d ðxj ; v i Þ ¼
where xj is the jth p-dimensional data vector (or pattern), vi is the prototype of the center of cluster i, uij is the degree of membership of xj in the ith cluster, m is a weighting exponent on each fuzzy membership, d(xj, vi) is a distance measure between data pattern xj and cluster center vi, l is the number of data patterns, and c is the number of clusters. The objective function J(U, V) is minimized via an iterative process in which the degrees of membership uij and the cluster centers vi are updated:
JðU; VÞ ¼
v i;d ðtÞ ¼ w v i;d ðt 1Þ þ c1 rand1 ðÞ ðpbesti;d xi;d ðt 1ÞÞ
ð2Þ
In addition to enforcing search space boundaries after updating a particle’s position, it is also customary to impose limitations on the distance a particle can move in a single step (Eberhart & Shi, 2000). This is done by bounding a particle’s velocity v to the full dynamic range of the search space, so the particle can at most move from one search space boundary to the other in one step.
JðU; VÞ ¼
2.1. Particle swarm optimization
þ c2 rand2 ðÞ ðpbestg;d xi;d ðt 1ÞÞ
the velocity to the particle’s current position causes the particle to move to another position in the search space. The new position of a particle is calculated using the following formula
2 um ij kxj v i k þ
i¼1
c X i¼1
gi
l X ð1 uij Þm
ð6Þ
j¼1
where gi is a suitable positive number, and uij 2 [0, 1]. Referring to the recommendation in Krishnapuram and Keller (1993), gi can be obtained from the average possibilistic intra-cluster distance of cluster i as
Pl
gi ¼ K
m j¼1 uij kxj Pl m j¼1 uij
v i k2
ð7Þ
9038
Y. Zhang et al. / Expert Systems with Applications 38 (2011) 9036–9040
Typically, K is chosen to be 1, and we set K a constant with the value 1 in this paper. The new membership update equation is:
uij ¼
1
ð8Þ
1 þ ðkxj v i k2 =gi Þ1=ðm1Þ
In the objective function (6), the first term demands that the distances from data points to the prototypes be as low as possible, whereas the second term forces the uij to be as large as possible, thus avoiding the trivial solution. Minimization of the objective function is an iterative optimization algorithm which can be summarized as the following steps: (1) Fix the number of clusters c; fix the weighting exponent m, 1 < m < 1; fix the iteration limit T; and choose the termination threshold e > 0. Initialize uij (i 2 {1, . . . , c} and j 2 {1, . . . , l}) of datum xj belonging to cluster vi such that uij 2 [0, 1]. (2) t = 1; (3) Estimate gi using Eq. (7); (4) Update the cluster centers vi with Eq. (4); (5) Update the membership matrix uij with Eq. (8); ðtÞ ðt1Þ (6) If max v i v i < e or t > T then stop iteration, other16i6c
wise set t = t + 1 and return step 3.
X
¼
l 1X ðxj v Þðxj v ÞT z l j¼1
ð10Þ
Originally, the Mahalanobis distance can be defined as a dissimilarity measure between two random vectors of the same distribuP tion with covariance matrix . If the covariance matrix is the identity matrix, the Mahalanobis distance reduces to the Euclidean distance. The Mahalanobis distance can be applied directly to modeling problems as a replacement for the Euclidean distance. When using P the Mahalanobis distance, the estimated covariance matrix can induce the singular problem. We can handle the singular problem P of the covariance matrix as the following method. According to matrix theories, any positive semi-definite matrix, which r is the P rank of the matrix, can be decomposed into ¼ AT GA, where G is a r r diagonal matrix and is component of nonzero eigenvalue P of . Obviously, G is non-singular. A is a r m matrix that is component of the eigenvectors corresponding to the eigenvalue of matrix G. According to this decomposition, pseudo-invertion of P covariance matrix can be calculated via the inverse matrix of Pþ G: ¼ AT G1 A. 3.1.2. PCM algorithm with Mahalanobis distance PCM algorithm based on Mahalanobis distance minimizes the following objective function
3. Image segmentation method based on PSO and PCM c X l c l X X X X 2 min J U; V; ¼ um gi ð1 uij Þm ij Dij þ
3.1. PCM algorithm based on Mahalanobis distance
i¼1
Distance metric is a key issue in many machine learning algorithms, such as clustering problems (Yang & Lin, 2006; Weinberger, Blitzer, & Saul, 2006; Globerson & Roweis, 2006; Torresani & Lee, 2007). Many methods are based on Euclidean distance metric and only be used to detect the data classed with same super spherical shapes. The Euclidean distance metric assumes that each feature of data point is equally important and independent from others. This assumption may not be always satisfied in real applications, especially when dealing with high dimensional data where some features may not be tightly related to the topic of interest. In this paper, we focus on learning a Mahalanobis distance metric in the possibilistic c-means algorithm. The Mahalanobis distance is a measure between two data points in the space defined by relevant features. Since it accounts for unequal variances as well as correlations between features, it will adequately evaluate the distance by assigning different weights or importance factors to the features of data points. Only when the features are uncorrelated, the distance under a Mahalanobis distance metric is identical to that under the Euclidean distance metric. In addition, geometrically, a Mahalanobis distance metric can adjust the geometrical distribution of data so that the distance between similar data points is small (Xing, Ng, Jordan, & Russell, 2003). Thus it can enhance the performance of clustering or classification algorithms. When some training cluster size is smaller than its dimension, it induces the singular problem of the inverse covariance matrix. 3.1.1. The Mahalanobis distance in the feature space Let X be a l n input matrix containing l random observations xi 2 Rn, i = 1, . . . , l. The squared Mahalanobis distance dM from a sample xi to the population X is defined as follows:
dM ðxi ; XÞ ¼ ðxi v ÞT
1 X ðxi v Þ
where v is a mean vector of all samples, calculated as:
ð9Þ P
is the covariance matrix
j¼1
i¼1
ð11Þ
j¼1
P where D2ij ¼ ðxj v i ÞT 1 ðxj v i Þ is a squared Mahalanobis distance. A data setX = {x1, x2, . . . , xl} includes l data points dividing into c fuzzy subsets that are characterized by representatives or prototypes v = {v1, v2, . . . , vc}. uij represents the membership of the data point xj from with respect to the ith cluster. The parameter m is the weighting exponent which determines the fuzziness of the clusters. The formulation of optimization using Lagrange multiplier method is as following:
L¼
c X l X i¼1
þ
T um ij ðxj v i Þ
j¼1
l X
ak 1
j¼1
c X
1 X
ðxj v i Þ þ
!
c X i¼1
gi
l X ð1 uij Þm j¼1
uij
ð12Þ
i¼1
The optimization problem is implemented by iteratively updating clustering centers, covariance matrices and membership degrees according to Eqs. (13)–(16) until the stopping criterion is satisfied (Gustafson & Kessel, 1979).
vi ¼
Pl
m j¼1 ðuij Þ xj m j¼1 ðuij Þ
Pl Fi ¼
ð13Þ
Pl
m ðxj v i Þðxj v i ÞT Pl m j¼1 ðuij Þ
j¼1 ðuij Þ
D2ij ¼ ðxj v i ÞT ½qi detðF i Þ1=n F 1 i ðxj v i Þ
uij ¼
1þ
Pc
1
1=m1 2 2 k¼1 ½ðDij =Dkj Þ= i
g
In Eq. (16), the value of gi is updated by Eq. (7).
ð14Þ
ð15Þ
ð16Þ
Y. Zhang et al. / Expert Systems with Applications 38 (2011) 9036–9040
3.2. A PSO-based parameters optimization 3.2.1. Particle representation To implement our proposed approach, this paper optimizes the initial clustering centers. Thus, we select clustering centers as the particles in particle swarm optimization algorithm. In Section 2.2, we use a vector v = {v1, v2, . . . , vc} to depict clustering centers. Each vi is the ith clustering center and is represented by a d-dimensional real-valued vector. Thus, v is a particle. Many particles form a population. It is assumed here that a set c1 ; v c2 ; . . . ; vcN T . of N particles forms a population as ½ v 3.2.2. Fitness definition Since the PSO algorithm depends only on the objective function to guide the search, it must be defined before the PSO algorithm is initialized. In the possibilistic c-means clustering algorithm, object function optimization is the criteria used to design a fitness function. Thus, a clustering function in PCM with Mahalanobis distance is chosen as the objective function in this study defined by
, Fitness ¼ 1
1þ
c X l X i¼1
j¼1
2 um ij Dij
þ
c X i¼1
l X gi ð1 uij Þm
! ð17Þ
j¼1
To solve a maximization problem with the objective function Fitness, the individual best pbesti(n) of the ith particle at the nth iteration can be determined such that Fitnessðpbest i ðnÞÞ P Fitnessð pbi ðsÞÞ, for s 6 n. The global best is referred to as the best position among all the individual best positions achieved so far. At the nth iteration, the global best gbest(n) is determined such that Fitness(gbest(n)) P Fitness(pbesti(n)), for i = 1, . . . , N.
9039
(4) Calculate the covariance matrix F(t) using Eq. (14) (5) Calculate the squared Mahalanobis distance (D2)(t) using Eq. (15) ðtÞ (6) Update the membership matrix uij with Eq. (16) ðtÞ ðt1Þ (7) If max pi pi < e or t > T then stop, otherwise set 16i6c
t = t + 1 and return (3). Step 3: Image segmentation using the results of Step 2.
4. Experimental results This section provides three images to verify the efficiency of the proposed method. We evaluated our proposed method on three images, and compared it with PCM algorithm. Experiments were done in Matlab 7.0. Suppose that the initial parameters in PCM algorithm with Mahanabois distance are weighting exponent m = 2, iteration limit T = 100, number of clusters c = 3, and termination threshold e = 1e 5. In addition, the related parameters of the PSO algorithm are given by population size N = 50, inertia weight w = 0.5, learning factors c1 = c2 = 0.5, and number of iterations itermax = 1000. Experimental results obtained from the standard FCM-based method and the proposed method are shown in Figs. 1–3, respectively. It is obvious from these comparisons that the proposed
3.3. Algorithm description Image segmentation algorithm based on PSO and PCM algorithm with Mahalanobis distance is executed in the following steps. Step 1: Parameters optimization based on PSO (1) Initialization Fix the number of clusters c; fix the weighting exponent m, 1 < m < 1; choose the number of particles N; initialize the position and velocity of each particle; fix inertia weight w, learning factors c1 and c2, and the number of iterations itermax (2) For t = 1 to itermax do For each particle do Calculate the squared Mahalanobis distance and the membership matrix U using Eqs. (15) and (16), respectively Calculate the fitness function of each particle using Eq. (17) Find the individual best pbesti for each particle and the global best gbesti Update the velocity and the position of each particle using Eqs. (1) and (2), respectively End for End for (3) Find the best particle labeled as v best ¼ fv b1 ; v b2 ; . . . ; v bc g, and get the membership matrix Ubest using Eq. (16) Step 2: Possibilistic c-means clustering based on Mahalanobis distance using optimized clustering centers (1) Initialization Fix the iteration limit Tand the termination threshold e > 0. Initialize the membership matrix U(0) = U best Initialize clusð0Þ ð0Þ ð0Þ tering centers v ð0Þ ¼ fv 1 ; v 2 ; . . . ; v c g ¼ v best (2) t = 1 (3) Calculate the cluster centers v(t) using Eq. (13)
Fig. 1. Experimental comparison: lena (a) original image; (b) segmentation image using standard FCM; (c) segmentation image using the proposed method.
Fig. 2. Experimental comparison: penna (a) original image; (b) segmentation image using standard FCM; (c) segmentation image using the proposed method.
Fig. 3. Experimental comparison (a) original image; (b) segmentation image using standard FCM; (c) segmentation image using the proposed method.
9040
Y. Zhang et al. / Expert Systems with Applications 38 (2011) 9036–9040
segmentation method has better segmentation effect than that of the standard FCM method. 5. Conclusions In this paper, an image segmentation method based on PSO and PCM algorithm with Mahalanobis distance is presented. The proposed method uses possibilistic c-means to overcome the relative membership problem of fuzzy c-means algorithm in image segmentation. We first replace Euclidean distance with Mahalanobis distance in the possibilistic c-means algorithm, and then optimize the initial clustering centers using particle swarm optimization method. Experimental results in image segmentation show the proposed algorithm is effective and advantageous. Acknowledgments This work is supported by Liaoning Doctoral Research Foundation of China (Grant No. 20081079), and Dalian Science and Technology Plan Foundation of China (Grant No. 2010J21DW019). References Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms, Plenum, N.Y. Chen, D. B., & Zhao, C. X. (2009). Data-driven fuzzy clustering based on maximum entropy principle and PSO. Expert Systems with Applications, 36, 625–633. Eberhart, R., Shi, Y. (2000). Comparing inertia weights and constriction factors in particle swarm optimization. In Proceedings of the 2000 congress on evolutionary computation (CEC). San Diego, CA, USA. El-Zonkoly, A. M. (2006). Optimal tuning of power systems stabilizers and AVR gains using particle swarm optimization. Expert Systems with Applications, 31, 551–557. Globerson, A., & Roweis, S. (2006). Metric learning by collapsing classes. In Advances in NIPS (pp. 451–458). Cambridge, MA, USA: MIT Press.
Gustafson, E., Kessel, W. (1979). Fuzzy clustering with a fuzzy covariance matrix. In Proceedings IEEE conference on decision and control (pp. 761–766). Kang, J. Y., Min, L. Q., Luan, Q. X., et al. (2009). Novel modified fuzzy c-means algorithm with applications. Digital Signal Process, 19(2), 309–319. Kennedy, J., Eberhart, R. C. (1995). Particle swarm optimization. In Proceedings of IEEE international conference on neural networks. Piscataway, NJ (pp. 1942–1948). Krishnapuram, R., & Keller, J. M. (1993). A possibilistic approach to clustering. IEEE Transaction on Fuzzy System, 1(2), 98–110. Krishnapuram, R., & Keller, J. M. (1996). The possibilistic c-means algorithm: Insights and recommendations. IEEE Transaction on Fuzzy System, 4(3), 385–393. Luis, G. U., Eli, S., Sreenath, R. V., et al. (2009). Automatic image segmentation by dynamic region growth and multiresolution merging. IEEE Transactions on Image Processing, 18(10), 2275–2288. Mualik, U., & Bandyopadhyay, S. (2000). Genetic algorithm-based clustering technique. Pattern Recognition, 33, 1455–1465. Ng, M. K., & Wong, J. C. (2002). Clustering categorical data sets using tabu search techniques. Pattern Recognition, 35(12), 2783–2790. Niknam, T., & Amiri, B. (2010). An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis. Applied Soft Computing, 10, 183–197. Niknam, T., Amiri, B., Olamaie, J., & Arefi, A. (2009). An efficient hybrid evolutionary optimization algorithm based on PSO and SA for clustering. Journal of Zhejiang University Science A, 10(4), 512–519. Shi, Y., Eberhart, R. (1998). A modified particle swarm optimizer. In Proceedings of 1998 IEEE international conference on evolutionary computation. Anchorage, AK, USA. Torresani, L., & Lee, K. C. (2007). Large margin component analysis classed. In Advances in NIPS (pp. 505–512). Cambridge, MA, USA: MIT Press. Weinberger, K., Blitzer, J., & Saul, L. (2006). Distance metric learning for large margin nearest neighbor classification. In Advances in NIPS (pp. 1473–1480). Cambridge, MA, USA: MIT Press. Wu, K. L., & Yang, M. S. (2002). An alternative fuzzy c-means clustering algorithm. Pattern Recognition, 35, 2267–2278. Xing, H. J., & Hu, B. G. (2008). An adaptive fuzzy c-means clustering-based mixtures of experts model for unlabeled data classification. Neurocomputing, 71, 1008–1021. Xing, E. P., Ng, A. Y., Jordan, M. I., & Russell, S. (2003). Distance metric learning, with application to clustering with side-information. In Advances in NIPS (pp. 505–512). Cambridge, MA, USA: MIT Press. Yang, L., Lin, R. (2006). Distance metric learning: A comprehensive survey. Technical Report. Michigan State University. Zhao, X. C. (2010). A perturbed particle swarm algorithm for numerical optimization. Applied Soft Computing, 10, 119–124.