Robust fuzzy clustering using nonsymmetric student׳s t finite mixture model for MR image segmentation

Robust fuzzy clustering using nonsymmetric student׳s t finite mixture model for MR image segmentation

Neurocomputing 175 (2016) 500–514 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Robust ...

4MB Sizes 0 Downloads 23 Views

Neurocomputing 175 (2016) 500–514

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Robust fuzzy clustering using nonsymmetric student's t finite mixture model for MR image segmentation Hongqing Zhu n, Xu Pan School of Information Science & Engineering, East China University of Science and Technology, Shanghai 200237, China

art ic l e i nf o

a b s t r a c t

Article history: Received 26 May 2015 Received in revised form 25 September 2015 Accepted 26 October 2015 Communicated by Zhaohong Deng Available online 9 November 2015

Accurate tissue segmentation from magnetic resonance (MR) images is an essential step in clinical practice. In this paper, we introduce a robust fuzzy clustering scheme for finite mixture model fitting, which exploits the merits of the mixture of the nonsymmetric Student's t-distribution and mean template to reduce the sensitivity of the segmentation results with respect to noise. This approach utilizes a fuzzy objective function regularized by the Kullback–Leibler (KL) divergence term and sets the dissimilarity function as the negative log-likelihood of the nonsymmetric Student's t-distribution and mean template. The advantage of this fuzzy clustering scheme is that the spatial relationships among neighbouring pixels are taken into account with the help of the mean template so that the proposed method is more robust to noise than several other existing fuzzy c-means (FCM)-based algorithms. Another advantage is that the application of the nonsymmetric Student's t-distribution mixture model allows the proposed model to fit different shapes of observed data. Experiments using synthetic and real MR images show that the proposed model has considerably better segmentation accuracy and robustness against noise compared with several well-known finite mixture models. & 2015 Elsevier B.V. All rights reserved.

Keywords: Segmentation Student's t mixture model Nonsymmetric distribution KL information Mean template Fuzzy c-means

1. Introduction Accurate segmentation from magnetic resonance (MR) images according to relevant anatomical structure plays a vital role in computer-aided diagnosis [1–4]. However, segmentation may be considerably difficult because MR images usually suffer from many artefacts, including noise, bias field, and partial volume effect. Therefore, searching for efficient MR image segmentation methods is still a popular research field. In recent years, statistical approaches, especially fuzzy c-means (FCM)-type fuzzy clustering [5,6] and neural network method [7] have been successful in addressing MR image segmentation because an FCM-type algorithm has additional flexibility, which allows the pixel to belong to multiple classes with varying degrees of membership. Further, they can preserve more information from the original image than the hard c-means clustering technique. Applying FCM had good segmentation results on images without noise. However, its accuracy in noisy images is not enough [8] mainly because each pixel is addressed as a separate unit; therefore, it did not consider the spatial information in the image space, which makes FCM very sensitive to noise and imaging artefacts. Because MR images are mostly noisy and have low contrast and inhomogeneity, to make FCM more n

Corresponding author. Tel./fax: þ 86 21 64253145. E-mail address: [email protected] (H. Zhu).

http://dx.doi.org/10.1016/j.neucom.2015.10.087 0925-2312/& 2015 Elsevier B.V. All rights reserved.

robust to noise and outliers, several modified FCM have been investigated. Such as Liao and Lin [9] developed a spatially constrained fast FCM clustering algorithm to improve the computational efficiency. There are also methods that modified the objective function, which allows a pixel to be labelled by the influence of its neighbourhood labels [10]. This is a very effective way of improving the robustness against noise. There are some other methods, such as that of Zhang et al. [11], that extended the FCM by using Student's t-distribution as the distance function rather than the traditional Euclidean distance. Yu and Yang [12] proposed a generalized fuzzy clustering regularization (GFCR) model and then studied its theoretical properties. GFCR presents most variations of FCM with a constraint on membership function. Nefti et al. in [13] presented a new merging method based on clustering of fuzzy sets in the parameter space of membership function. Recently, fuzzy clustering with multiview data is becoming a hot topic in data mining and machine learning. In [14], Jiang et al. proposed a multiple weighted view fuzzy clustering called WV-Co-FCM algorithm which can automatically identify the importance of each view, weight each view and carry out a weighted approach to multiview fuzzy clustering. Another widely used statistical method is the finite mixture model (FMM), which provides a mathematical-based approach to the statistical modelling of a wide variety of random phenomena. In these model-based techniques, the Gaussian mixture model (GMM) has been selected most widely as a particular case of FMM [15,16]. However, the main drawback of GMM is its sensitivity to outliers. To

H. Zhu, X. Pan / Neurocomputing 175 (2016) 500–514

solve this problem, Student's t mixture model (SMM) [17,18] is proposed, where the probability distribution function of Student's t has longer tails and one more parameter compared to Gaussian distribution. However, the relationship among neighbouring pixels is not taken into account so that GMM and SMM are sensitive to changes in the pixel intensity. To solve this problem, the Markov random field (MRF) [19–21] and mean template [11,22] techniques have been applied to impose spatial smoothness constraints on the image pixel. Yang et al. in [1] embed an MRF to the conventional level set energy function to segment glioma in brain MR images. Han et al. [23] integrated an adaptive MRF model and the coupled level-set information into the prior term, and presented a clinical data-driven approach for segmenting the bladder wall. In [24], Feng and Chen described a modified FCM where a prior spatial constraint (defined as refusable level in their paper) was introduced into FCM through MRF theory, so that the spatial information was encoded through mutual influences of neighbouring sites. Chatzis and Varvarigou in [25] proposed a novel fuzzy clustering type treatment of the hidden MRF model. Their method utilized a fuzzy objective function regularized by Kullback–Leibler (KL) divergence information, offered a significant enhancement for the image segmentation. However, among the algorithms based on the MRF-based technique, the log-likelihood function is too complex to use the EM algorithm directly, which causes the cost of computational complexity to rise sharply. Another phenomenon with the FMM approach has generally been identified, and the Gaussian distribution and Student's t-distribution are all symmetric shape and have a single peak. However, in real applications, it is noted that the intensity distributions of each label type of the data set do not exhibit a regular shape exactly. To overcome this problem, Vadaparthi et al. [5] presented a skew symmetric mixture model to improve the final segmentation accuracy. Wang et al. [26] defined a local Gaussian distribution fitting energy function to make it possible to distinguish regions by using similar intensity means but different variances. Nguyen et al. [27,28] also proposed some nonsymmetric mixture models based on the nonsymmetric Gaussian or Student's t-distributions. This nonsymmetric mixture model helps to fit different shapes of observed data. Recently, another family of finite mixture models based on FCM has been successfully applied to image segmentation [29–32] such as Chatzis and Varvarigou [29], which extended the FCM by

providing a fuzzy clustering-type treatment of the finite Student's t mixture models. In [30], the authors provided a general way of combining Gaussian distribution with partial membership and declared that FCM could be employed to provide a fuzzy clustering-type methodology for training any type of FMM. As an example, they presented the so-called KL-FCM methods where GMM are trained under the fuzzy clustering principle [31,32]. In their method, KL divergence term was introduced into the objective function. It has proven to be quite effective for image segmentation. However, one of the main disadvantages of KL-FCM is that it does not consider the spatial relationship of the neighbouring pixels and lacks sufficient robustness to noise. On the other hand, among these methods, the probability distribution function of each label type of the data set is symmetric. Considering the above reasons, we present a clustering approach that incorporates a nonsymmetric Student's t-distribution mixture model and mean template into FCM (FNSK) to segment the MR images. The novel framework utilizes a variant of the FCM scheme, regularized by KL information by considering the spatial information between neighbouring pixels simultaneously. The distance function is measured by multivariate nonsymmetric Student's t instead of the Euclidean distance in the traditional FCM. The advantage of the proposed framework is that it allows one to effectively incorporate the information regarding the considered real objects' nonsymmetric distributions into the fuzzy clustering training. Our method is different from [31]. With our Student's tdistribution has heavily tailed and more robust to outliers while the method in [31] is sensitive to outliers and may lead to excessive sensitivity to small numbers of data points [27]. In addition, in many real applications, the intensity distribution of each label type of the data set is not symmetric. Therefore, the results of the model, which are based on the symmetric Gaussian distribution [31] is slightly poor in these nonsymmetric situations. Although the proposed method follows the idea using the FCM and KL divergence term suggested in [32], the mean template used in this study makes the proposed model more robust to noise. Unlike approach in [32], the nonsymmetric Student's t-distribution used in our method might lead to the proposed FNSK that has the flexibility to fit different shapes of observed data. All of these factors make the new model an accurate technique for MR image segmentation. The proposed approach has been applied for segmenting simulation and

Table 1 Description of symbols. Symbol

Description

xi N Ωj C Ni Φðxi j Θj Þ μj Σj πj uij f ðxi j ΘÞ J λ q dij

The i-th pixel of an image The number of pixels in an image The j-th label in an image The number of labels in an image The neighbourhood pixels of the i-th pixel The Student's t-distribution The mean of Φðxi j Θj Þ The covariance of Φðxi j Θj Þ The prior probability of all pixel belonging to Ωj The posterior probability of the pixel xi belonging to Ωj The density function at an observation xi The objective function The degree of fuzziness The size of the neighbourhood window The dissimilarity function of the i-th pixel with respect to the j-th label The degree of freedom of Student's t-distribution The weighting factor of different Student's t-distribution The number of Student's t-distribution The weighting function of mean template The Euclidean distance function between pixel m and i The normalized factor of mean template

vjl ηjl kj ωm Lmi Ri

501

502

H. Zhu, X. Pan / Neurocomputing 175 (2016) 500–514

0.4

0.4 ν = 0.1 ν=1 ν = 100

0.35

0.35

0.25

0.25

p(x)

0.3

p(x)

0.3

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0 −5

0

5

0 −5

0

x

5

x

Fig. 1. Plot of the 1-D distributions (zero mean and variance one). (a) Student's t-distribution and (b) Gaussian distribution.

clinical MR images, and the results are compared to other models based on FCM and FMM schemes. The remainder of this paper is organized as follows. Section 2 briefly reviews the related works, which are closely related to this paper. In section 3, a full explanation of the proposed model is given. Section 4 will present the parameter estimation. In Section 5, the experimental results are provided to verify the accuracy of the proposed method. Finally, in the concluding section, we summarize this paper.

2. Related works In this section, we start with a brief review of the SMM and then present the basis notions of FCM. A description of the symbols used to describe the mathematical model is shown in Table 1. 2.1. Student's t mixture model Let xi ; i ¼ ð1; 2; …; NÞ, with dimension D, denote an observation at the i-th pixel of an image. The purpose is to classify N pixels of an image into C labels. If labels are denoted by ðΩ1 ; Ω2 ; …; ΩC Þ, the finite mixture models assume that the density function f ðxi j ΘÞ at each pixel xi can be expressed as f ðxi j ΘÞ ¼

C X

π j Φðxi j Θj Þ;

ð1Þ

j¼1

where πj is the prior probability of the pixel xi belonging to label Ωj and satisfies the following constraints: 0 r πj r 1

and

C X

π j ¼ 1;

ð2Þ

j¼1

and Φðxi j Θj Þ is the j-th mixture component distribution. If Φðxi j Θj Þ is Student's t-distribution, each probability density function of Student's t-distribution with mean μj, covariance Σj, and degrees of freedom vj 4 0 is defined by   v þD j Σ j j  1=2 Γ j 2 Φðxi j Θj Þ  Sðxi j μj ; Σ j ; vj Þ ¼ v h iðvj þ DÞ=2 ; D j 1 þ vj 1 δðxi ; μj ; Σ j Þ ðπ v j Þ 2 Γ 2 ð3Þ where δðxi ; μj ; Σ j Þ ¼ ðxi  μj ÞT Σ j 1 ðxi  μj Þ is the squared Mahalanobis distance between xi and μj with respect to covariance Σj, and

Γ ðÞ is the Gamma function Γ ðsÞ ¼

Z

1

t s  1 e  t dt:

ð4Þ

0

Fig. 1(a) illustrates Student's t-distribution with different degrees of freedom v for the same mean μ and the covariance Σ. Compared to the Gaussian distribution (Fig. 1(b)), their overall shapes are very similar; however, Student's t-distribution provides a longer tailed alternative to the Gaussian distribution. With the number of degrees of freedom v growing, Student's t-distribution tends to the Gaussian distribution. Hence, SMM-based probabilistic generative models provide better algorithm stability about the outliers residing within the fitting data sets compared to GMM-based models. 2.2. Fuzzy clustering algorithm The FCM method classifies the images by grouping similar data points in the feature space into clusters. If X ¼ fx1 ; x2 ; …; xN g is denoted a set of N pixels to be partitioned into C clusters, the FCM clustering algorithm will minimize the following objective function: Jα 9

N X C X

uαij dij ;

ð5Þ

i¼1j¼1

where dij is the dissimilarity function of the ith data point with respect to the jth cluster. Parameter α⩾1 is the weighting exponent on each fuzzy membership uij and determines the amount of fuzziness of the clustering algorithm, and uij denotes the membership of pixel xi belonging to the class j, which satisfies the constraints: 0 r uij r 1

and

C X

uij ¼ 1:

ð6Þ

j¼1

The objective function in (5) can be easily minimized using the Lagrange multiplier method.

3. Proposed methods As indicated by the objective function (5), pixel xi is an independent sample, and thus, the FCM does not take into account the spatial correlation between the neighbouring pixels in the optimization process. Moreover, the membership function uij in (5) is decided by the common Euclidean distance formula. Thus, higher membership is assigned to points whose intensities are close to the cluster centre. Therefore, they increase the sensitivity of the membership function to noise. Thus, we need to define a suitable dissimilarity function, dij, taking into account the hypothesized

H. Zhu, X. Pan / Neurocomputing 175 (2016) 500–514

probabilistic nature and the properties of the clusters being sought. Motivated by the above considerations, this paper minimizes the following new objective function by introducing the KL divergence term together with the weighted mean template   N X C N X C X X ωm X uij J9 uij dmj þ λ uij log ; ð7Þ R πj i mAN i¼1j¼1 i¼1j¼1 i

where Ni is the neighbourhood of the ith pixel including the ith pixel itself, e.g. 3  3 or 5  5. Unless otherwise specified, in our model, 3  3 square window are used. It is easy to observe by combining (5) and (7) that the first term of the right-hand side of (7) can be obtained by incorporating a weighted mean template into the standard FCM (5). We choose the weighted mean template on the basis of such a considerations which it can erase the noise on the central pixels of the windows by applying mean templates to the dissimilarity function calculation. In our technique, the fuzzification is achieved by means of a regularization technique where KL divergence term is introduced into FCM. Here, the second term of (7) represents KL divergence term which works as the fuzzifier, and the parameter λ is the degree of fuzziness of the fuzzy membership. Under this setting, the proposed objective function is actually a regularized by KL information FCM, and the parameter λ could also be called a regularization factor. To incorporate local spatial information, the weighting function ωm should be inversely proportional to the Euclidean distance function Lmi between pixel m and pixel i. In this paper, ωm is chosen as follows: ! 1 L2mi q1 with δ ¼ ; ð8Þ ωm ¼  exp   2 4 2 1=2 2 δ 2πδ where q is the size of the neighbourhood window. As a normalized factor, Ri in (7) is denoted by X Ri ¼ ωm : ð9Þ m A Ni

How to select the dissimilarity function dij has been widely considered by various researchers. In this paper, we conduct the FCM treatment of the nonsymmetric Student's t finite mixture model. A nonsymmetric distribution Φðxi j Θj Þ that is used for the component of our mixture model is defined as follows:

Φðxi j Θj Þ ¼

kj X

ηjl Sðxi j μjl ; Σ jl ; vjl Þ;

ð10Þ

l¼1

where parameter ηjl is the weighting factor of a different Student's t-distribution, which satisfies the following constraints:

ηjl Z 0 and

kj X

ηjl ¼ 1:

ð11Þ

l¼1

The estimates of the weighting factor ηjl at the new step are given in the following section. Thus, combining (1) and (10), the density function f ðxi j ΘÞ at a pixel xi in (1) is rewritten by f ðxi j ΘÞ ¼

C X j¼1

πj

kj X

ηjl Sðxi j μjl ; Σ jl ; vjl Þ:

ð12Þ

l¼1

We define that the distribution in (10) is based on a fact that nonsymmetric distribution can be approximated by multiple Student's t-distribution. In this study, we select the dissimilarity function as the negative log-likelihood of the nonsymmetric Student's t-distribution: 2 3 kj X 4 ð13Þ dij ¼  log π j ηjl Sðxi j μjl ; Σ jl ; vjl Þ5; l¼1

503

where kj is the number of Student's t-distribution that is applied to model the class label Ωj, πj is the prior probability of the j-th cluster, and Sðxi j μjl ; Σ jl ; vjl Þ is Student's t-distribution with mean μjl, covariance Σjl, and degree of freedom vjl: Sðxi j μjl ; Σ jl ; vjl Þ ¼

Γ



π vjl

D=2

Γ

  vjl þ D j Σ jl j  1=2 2

 h iðvjl þ DÞ=2 : vjl 1 1 þ vjl 1 ðxi  μjl ÞT Σ jl ðxi  μjl Þ 2

ð14Þ Pkj Because ηjl in (11) are positive and satisfy the constraint l ¼ 1 ηjl ¼ 1, P  kj s Z in light of the Jensen inequality [33], we have log l¼1 l P  P kj log η s Z kl j¼ 1 ηjl log sl . We define the variable of yijl as l ¼ 1 jl l yijl ¼ Pk

ηjl Sðxi j μjl ; Σ jl ; vjl Þ

j

h¼1

:

ð15Þ

ηjh Sðxi j μjh ; Σ jh ; vjh Þ

Pkj

yijl ¼ 1. Similarly, P  kj according to Jensen inequality, we can obtain log ς Z l¼1 l P  P kj k log y ς Z l j¼ 1 yijl log ςl . Thus, we can rewrite the disl ¼ 1 ijl l Thus, the values yijl is positive and satisfy

l¼1

similarity function (13) as follows: 2 3 kj X η Sðxi j μ ; Σ jl ; vjl Þ5 dij ¼  log π j log 4 jl

jl

l¼1

r

8 < :

kj X

log π j þ

h

yijl log ηjl þ log Sðxi j μjl ; Σ jl ; vjl Þ

l¼1

9 i= ;

:

ð16Þ

4. Parameter learning and computational complexity 4.1. Parameter learning In this subsection, we are particularly interested in the optimization problem of the objective function (7) with respect to parameter set Θ ¼ fμjl ; uij ; π j ; yijl ; ηjl ; Σ jl g. To obtain the estimation of parameters, we first minimize the objective function Ju over uij under the conPC straints uij ⩾0 and j ¼ 1 uij ¼ 1; 8 i ¼ 1; …; N. Using the Lagrange multiplier β for each pixel point to enforce the constraint, we obtain 0 1 C X ð17Þ uij A: J u ¼ J þ β @1  j¼1

To calculate the values uij at the iteration ðt þ 1Þ, let the derivation of (17) with respect to uij be equated to zero ð∂J u =∂uij ¼ 0Þ. Thus, we have    X ωm uij ∂J u þ 1  β ¼ 0; ¼ dmj þ λ log ð18Þ ∂uij m A N Ri πj i

which implies uij ¼ π j  exp

β

X ωm d Ri mj mAN

!,

!

λ 1 :

ð19Þ

i

P Because uij satisfies the relation of type Cj¼ 1 uij ¼ 1, this means ! !, C X X ωm π h  exp β  dmh λ  1 ¼ 1; ð20Þ Ri mAN h¼1 i

derived to expðβ =λÞ ¼ 1=

C X h¼1

π h  exp



X ωm d Ri mh mAN i

!,

!

λ1 :

ð21Þ

504

H. Zhu, X. Pan / Neurocomputing 175 (2016) 500–514

Combining expressions (19) and (21) yields the following new estimator of uij:   1P ωm ðkÞ π ðkÞ dmj  1 m A Ni j exp  λ Ri þ 1Þ  : ð22Þ uðk ¼ ij PC 1P ωm ðkÞ ðkÞ dmh  1 h ¼ 1 π h exp  m A Ni Ri λ Next, we consider the estimation of prior distribution πj. Using Lagranges multiplier γ and letting the partial derivative of the objective function (7) with respect to πj equal to zero ð∂J π =∂π j ¼ 0Þ, we have 0 1 C X @ ð23Þ Jπ ¼ J þ γ 1  π j A; j¼1

According to (31) and (33), one can obtain the new estimator ηjl as follows: PN ðk þ 1Þ jl

η

uðkÞ ij

i¼1

¼P

kj h¼1

PN

i¼1

P m A Ni

uðkÞ ij

ωm Ri

P

m A Ni

yðkÞ mjl

ωm Ri

yðkÞ mjh

:

To derive the expression of covariance Σjl, we need to perform the minimization of J over it. Hence, the solution of ∂J=∂Σ jl ¼ 0 yields the minimization of Σjl at the ðk þ 1Þ iteration step:  T PN ωm ðkÞ ðkÞ  ðkÞ P xm  μðkÞ y t xm  μðkÞ i ¼ 1 uij m A Ni jl jl Ri mjl mjl ðk þ 1Þ Σ jl ¼ ; ð35Þ PN ωm ðkÞ ðkÞ P ymjl m A Ni i ¼ 1 uij Ri where

and N N   X X π j uij ∂J π ¼ uij  π j 1  λ uij  2  γ ¼ 0: ∂π j i ¼ 1 u ij π j i¼1

As a conclusion, we obtain ! N X π j ¼  ðλ þ 1Þ uij =γ :

ð24Þ

vjl þ D

t mjl ¼

1

vjl þ ðxm  μjl ÞT Σ jl ðxm  μjl Þ

ð25Þ

Because the prior distribution πj is subject to the constraint π j ⩾0 P and Cj¼ 1 π j ¼ 1 and for Eq. (25) together with the constraint, we have the following expression: ! C N X X  ðλ þ 1Þ uih =γ ¼ 1: ð26Þ i¼1

h¼1

Then, (26) changes to

γ ¼  ðλ þ 1Þ

C X N X

ð27Þ

h¼1i¼1

So, combining (25) with (27) finally gives the estimator of follows: þ 1Þ π ðk ¼ j

N 1X uðkÞ : N i ¼ 1 ij

ð36Þ

Then, the form of means μjlðt þ 1Þ is estimated by PN ðk þ 1Þ jl

μ

i¼1

¼ P N

πj as ð28Þ

Similarly, continuing in this way, we can obtain the estimation of the weighting factor ηjl by using the Lagrange multiplier ξ for each pixel data: 0 1 kj X ð29Þ J η ¼ J þ ξ @1  ηjl A:

P m A Ni

uðkÞ ij

P

ωm Ri

m A Ni

yðkÞ t ðkÞ x mjl mjl m

ωm Ri

yðkÞ t ðkÞ mjl mjl

:

ηðkÞ Sðxi j μjl ; Σ jl ; vjl Þ jl j

h¼1

ηðkÞ Sðxi j μjh ; Σ jh ; vjh Þ jh

:

N X

ηjl ¼ 

uij

i¼1

!

X ωm y =ξ: Ri mjl mAN

ð31Þ

i

In addition, note that the weighting factor ηjl should satisfy the Pk constraints ηjl Z 0 and l j¼ 1 ηjl ¼ 1; thus, we have ! kj N X X X ωm uij ymjh =ξ ¼ 1: ð32Þ  Ri mAN h¼1 i¼1

Sðxi j μjl ; Σ jl ; vjl Þ  Nðxi j μjl ; Σ jl =t ijl Þ  gðt ijl j vjl =2; vjl =2Þ;

N XX kj

h¼1i¼1

uij

X ωm y : Ri mjh mAN i

ð33Þ

ð39Þ

where Nðxi j μjl ; Σ jl =t ijl Þ is the Gaussian distribution: Nðxi j μjl ; Σ jl =t ijl Þ ¼

1 ð2π ÞD=2 j Σ jl =t ijl j 1=2

!  1 Σ jl 1 exp  ðxi  μjl ÞT ðxi  μjl Þ ; 2 t ijl

ð40Þ

and the Gamma distribution is defined by gðt ijl j vjl =2; vjl =2Þ ¼

1

Γ ðvjl =2Þ

 v =2  ðv =2Þ  1   vjl =2 jl t ijl jl exp  vjl t ijl =2 : ð41Þ

From (14) and (39), let the derivative of objective function (7) with respect to the degrees of freedom vjl equal to zero, that is, 8 0 > kj N ∂vjl :m A Ni Ri i¼1 l¼1 19 > = C þ log ½Nðxm j μjl ; Σ jl =t mjl Þ  gðt mjl j vjl =2; vjl =2ÞÞA ¼ 0: > ;

i

It is clear that (32) implies the following relation:

ð38Þ

Following Liu and Rubin [34], there is no closed-form solution for maximizing the log-likelihood under Student's t-distribution. The Student's t-distribution is represented approximately as a Gaussian distribution with scaled precision, where the scalar is a Gamma distributed latent variable:

i

It is easy to obtain

ð37Þ

The next step is to update the value yijl by considering the derivative of J with respect to yijl and letting ∂J=∂yijl ¼ 0. Thus, we find that yijl at the iteration of the current step is

l¼1

To optimize the parameter ηjl, we consider the derivative of the function in (29) with weighting factor ηjl, that is, ! N X ωm X ∂J η ymjl ¼ u  ð30Þ  ξ ¼ 0: ∂ηjl i ¼ 1 ij m A N Ri ηjl

uðkÞ ij

i¼1

þ 1Þ yðk ¼ Pk ijl

uih :

:

To compute the means μjl, we have to study the derivative of J with respect to means μjl and set this result to zero, that is, ∂J=∂μjl .

i¼1

ξ¼ 

ð34Þ

Because 8 9 kj
ð42Þ

H. Zhu, X. Pan / Neurocomputing 175 (2016) 500–514

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

0.2

0.4

0.6

0.8

1

0

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

0.2

0.4

0.6

0.8

1

0

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

0.2

0.4

0.6

0.8

1

0

505

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

Fig. 2. Synthetic data clustering to test the influence of parameter λ: contours of clusters obtained using the FNSK. (a) λ¼ 0.1; (b) λ¼0.2; (c) λ¼ 0.4; (d) λ¼0.6; (e) λ¼ 0.8, and (f) λ¼1.

kj X ωm X D 1 1 ymjl  log ð2π Þ  log Σ jl þ Eðlog t mjl Þ 2 2 2 R m A Ni i l ¼ 1       T vjl vjl 1 1  xm  μjl Σ jl xm  μjl Eðt mjl Þ þ log 2 2 2       vjl vjl vjl  1 E log t mjl  Eðt mjl Þ  log Γ : þ 2 2 2

and a relation

¼

      vjl þ D vjl þ D E log t mjl ¼ log t mjl  log þψ : 2 2

ð43Þ

Substituting (44) and (45) into (43) and using (42) and (43), one can derive the degrees of freedom vjl for each component, which is computed as the solution to the following equation:

ð44Þ



Using the conclusions described in [35], we obtain Eðt mjl Þ ¼ t mjl ¼

vjl þD vjl þ ðxi  μjl ÞT Σ jl 1 ðxi  μjl Þ

;

ð45Þ

þ 1Þ vðk jl

2

! þ log

þ 1Þ vðk jl

2

! þψ

vðkÞ þD jl 2

!

506

log

H. Zhu, X. Pan / Neurocomputing 175 (2016) 500–514

vðkÞ þD jl 2

PN

! þ1þ

i¼1

uðkÞ ij

P m A Ni

PN

i¼1

ωm

Ri ðkÞ P

uij

n o yðkÞ log t ðkÞ t ðkÞ mjl mjl mjl

m A Ni

ωm Ri

yðkÞ mjl

¼ 0; ð46Þ

number of classes C. For example, for C ¼ 4, the prior πj is set [0.25, 0.35, 0.4, 0.2] in our experiment. Similarly, the initialized values of ηjl in our method should subject to the constraint of (11). We present the results and discuss the qualitative and quantitative evaluation of the obtained segmentations.

where

ψ ðsÞ ¼ f∂Γ ðsÞ=∂sg=Γ ðsÞ

ð47Þ

stands for the digamma function. The solution of (46) does not exist in closed form. A closed form approximation of this equation has been devised heuristically by Shoham [36]. In addition, this equation can also be solved by Matlab function fzero (function, parameters). So far, for minimizing the objective function J, the steps of the proposed method based on FCM and the nonsymmetric Student's t-distribution are finished. To conclude the proposed method, the procedures of our algorithm can be summarized as follows. Algorithm: Step 1: Initialize the parameters: The means μjl, covariance Σjl, weight factor ηjl, freedom of degree vjl, and prior probability πj. Step 2 (E-step): Calculate the posterior probability uij by using (22); evaluate the novel yijl by using (38); and update the weight factor ηjl by using (34). Step 3 (M-step): Update the mean means μjl by using (37); evaluate the covariance Σjl by using (35); compute the freedom of degree vjl by using (46); and estimate the prior probability πj by using (28). Step 4: Check for convergence of parameter values. If the convergence of (7) is satisfied, terminate the iteration; otherwise, increase the iteration k ¼ k þ 1 and go to step 2. 4.2. Computational complexity analysis In this subsection, we study the computational complexity of the proposed method. The complexity of each step depends on the four variables: (1) the total numbers of pixels N; (2) the number of clusters C; (3) the number of neighbour Nw; (4) the number of nonsymmetric Student's t-distribution h less or equal to kj. Since the algorithm runs iteratively to obtain the segmentation results, we need to analyse the complexity of each step and assign the maximum step complexity as the complexity of the algorithm. In the E-step of the FNSK, we need to calculate the (22), (38), and (34). The complexity is OðCN w Þ, OðCkj Þ, and OðCN w NÞ, respectively. Next, for M-step, the computation of (37), (35), (46), and (28) has complexity that is OðCN w NÞ, OðCN w NÞ, OðCN w NÞ, and OðCNÞ, respectively. Suppose FNSK converges after t iterations, the total complexity is Oðtkj CNw NÞ.

5. Experimental results In this section, three experiments are conducted to evaluate the effectiveness of the proposed method, and the results are compared with two popular types of finite mixture models, namely, GMM, SMM, and two other state-of-the-art models including ACAP [22] and NSMM [27] by using synthetic and real MR images. The initialized values of means μjl and variance Σjl are set by adapting the K-means algorithm, and the freedom of degree vjl in SMM, NMSS and FNSK is assigned a value of 1. Since the prior πj P should satisfy the constraint π j Z 0 and Cj¼ 1 π j ¼ 1, based on the

5.1. Synthetic data clustering We begin with experiments using two-dimensional points to verify the performance of the proposed method. We utilize a fivecomponent bivariate Gaussian distribution to generate the simulated data, including 2000 normalized points. There are 400 points from each component. The mean of the Gaussian distributions are μ1 ¼ ð5:1291; 7:0924ÞT , μ2 ¼ ð4:6048; 1:1597ÞT , μ3 ¼ ð3:5040; 0:7808ÞT , μ4 ¼ ð0:9505; 3:6925ÞT , and μ5 ¼ ð4:3367; 0:3363ÞT , respectively. The covariances are Σ 1  Σ 5 ¼ ð1:5; 0; 0; 1:5Þ. Data are then scaled in the interval [0,1]. First, choosing the value for the degree of fuzziness parameter λ in the proposed clustering method is conducted using performance-based criterion. A detailed theoretical framework for its drive was proposed in [37]. In this paper, we fit a three-component FNSK using these simulated data points. To ensure a fair comparison, we repeat this clustering procedure 10 times using different initializations each time, and we compute the mean performance of each model. The obtained results for different values of the degree of fuzziness λ are illustrated in Fig. 2. Table 2 lists the clustering error rates of the FNSK with different λ. For different λ values, FNSK are likely to successfully classify the data points. However, we also observe that the values of λ in the interval [0.1, 0.6] yield the best results. Therefore, when a clustering application of the proposed models is intended, we proposed the practitioners experiment with values of λ ¼0.5. Next, we studied the behaviour of our method using noise obtained from a bivariate uniform distribution where 40 or 100 points (outliers) are added to the above data set. Then, the GMM, SMM, NSMM, ACAP, and proposed method are tested on these noisy data. The simulated data points along with the contours of the clusters obtained using the evaluated algorithms are illustrated in Fig. 3. As we observe, GMM is rather sensitive to these outliers. The ACAP method significantly improves the result and reduces the effect of the outlier compared to GMM. In addition, Student's t-distribution is heavily tailed versus Gaussian distribution, and hence, the FMM of the longer tailed Student's t-distribution, such as SMM and NSMM, provide a considerably more robust response to the outlier than GMM. The proposed method is more robust against noise with the highest accuracy. We believe this is because the nonsymmetrical Student's t-distribution and mean template are used in the proposed model so that some spatial information is taken into consideration. To objectively evaluate and compare the performance of the proposed method, in Table 3, we provide a summary of the obtained clustering error rates for the considered simulated data points. As we noted, the proposed approach offers enhanced data classification capabilities compared to its competitors, yielding an improved clustering performance for the same computational demands. Additionally, due to the use of neighbourhood information, the ACAP method also performs considerably better than SMM and NSMM.

Table 2 Data clustering: average error rates of FNSK with various λð%Þ. λ

0.1

0.2

0.4

0.6

0.8

1.0

Error rates

0

0

0

0.2

0.3

0.5

H. Zhu, X. Pan / Neurocomputing 175 (2016) 500–514

507

Fig. 3. Experiment on noisy data clustering. The uniform distribution noise point of the first column is 40, and the right column is point 100. From up to down is GMM, SMM, NSMM, ACAP, and FNSK (λ ¼ 0.5), respectively.

508

H. Zhu, X. Pan / Neurocomputing 175 (2016) 500–514

5.2. Segmentation of a synthetic image

5.3. Segmentation of medical imaging data

The experiment has selected the open access, freely available database BrainWeb [38] shown in Fig. 4 to investigate the robustness of the proposed models to noises. These MR images with a size of 362  434 are first corrupted by Salt & Pepper noises with a density of 5%. Then, segmentation experiments are performed on these images to achieve the results. The iteration number of all methods is set to 15 at which most cases can be converged. Fig. 4 visualizes the segmentation results. For the results shown in Fig. 4, we observe that GMM, SMM, and NSMM cannot completely eliminate the effect of noise on the final segmentation results. For GMM and NSMM methods, some small portions of pixels have been misclassified. ACAP and FNSK show apparently better delineations of regions and successfully segment the noisy images. To objectively evaluate and compare the segmentation performance of the proposed methods with others under different noise environments, the experiment employs the misclassification ratio (MCR) [39] to evaluate the accuracy of the segmentations. Here, Salt & Pepper noise with different intensities and Gaussian noise with different variants are also considered. All comparative evaluations are listed in Tables 4, 5, and 6, respectively, demonstrating that GMM, SMM, and NSMM cannot obtain a good segmentation result. We believe this is because GMM and SMM do not take the spatial pixel information into consideration. In addition, we noted that the segmentation result of ACAP is better than GMM and SMM. The performance of the proposed method is more prominent with the lowest MCR value and demonstrates a higher degree of robustness with respect to noise.

The third experiment investigates the characteristics of the proposed approach on two publicly available medical image Table 4 MCR of simulated brain MR image (the first row of Fig. 4) segmentation using different methods (%). Methods

GMM SMM NSMM ACAP FNSK

Methods

GMM SMM NSMM ACAP FNSK

SMM

NSMM

ACAP

FNSK

40 100

6.64 13.18

5.42 8.37

3.35 4.68

1.12 3.13

0.31 0.52

0.01

0.03

0.05

5%

8%

10%

26.22 24.17 22.25 17.74 15.58

28.37 27.44 26.61 18.73 16.63

36.02 35.56 34.04 20.68 18.04

21.52 20.82 19.51 14.48 12.67

23.78 21.35 20.17 15.02 13.12

24.65 22.61 22.33 15.98 13.86

Gaussian noise (variance)

Salt & Pepper noise (%)

0.01

0.03

0.05

5%

8%

10%

24.52 22.27 21.58 16.52 15.36

28.71 27.89 26.61 18.05 17.74

36.62 34.67 34.02 19.44 18.51

21.80 19.64 18.37 14.21 12.44

22.92 20.19 19.63 15.05 12.36

23.73 21.54 20.42 15.88 13.18

Table 6 MCR of simulated brain MR image (the third row of Fig. 4) segmentation using different methods (%).

Table 3 Data clustering: average error rates (%). GMM

Salt & Pepper noise (%)

Table 5 MCR of simulated brain MR image (the second row of Fig. 4) segmentation using different methods (%).

Methods

Noise (points)

Gaussian noise (variance)

GMM SMM NSMM ACAP FNSK

Gaussian noise (variance)

Salt & Pepper noise (%)

0.01

0.03

0.05

5%

8%

10%

25.27 23.04 21.39 17.21 15.49

27.45 26.64 24.33 18.84 17.73

36.46 34.12 34.64 20.16 18.56

21.61 20.71 18.18 14.03 12.84

22.51 21.02 20.56 15.12 12.89

23.84 21.95 21.08 15.56 13.63

Fig. 4. Segmentation of simulated brain MR images. From left to right is original images, Salt & Pepper noise version (5%), GMM, SMM, NSMM, ACAP, and FNSK, respectively.

H. Zhu, X. Pan / Neurocomputing 175 (2016) 500–514

509

Table 7 Knee tissue segmentation results via different methods. Image name

Knee1

Knee2

Knee3

Knee4

MR images

GMM

SMM

NSMM

ACAP

FNSK

Table 8 Comparison of different methods for clinical MR images (PR index). Images

GMM

SMM

NSMM

ACAP

FNSK

Knee1 Knee2 Knee3 Knee4

0.8217 0.7937 0.7915 0.7852

0.8233 0.8164 0.7970 0.7933

0.8351 0.8236 0.7994 0.7982

0.8406 0.8374 0.8203 0.8147

0.8584 0.8506 0.8487 0.8322

databases. The first test data set [40] consists of four knee MR images with 350  350 pixels shown in the first row of Table 7. The segmentation results using different methods are also illustrated in this table. The number of class K is set to three by human vision. As seen, the segmentation accuracy of GMM, SMM, and NSMM along the objective boundaries is modestly poor. By comparing

these results to those obtained using ACAP and the proposed method, we can conclude that when considering the spatial relationship among neighbouring pixels, the performance can be considerably improved. To further assay the performance, quantitative evaluations are performed by calculating the probabilistic rand (PR) index [41], which measures the similarity between two

510

H. Zhu, X. Pan / Neurocomputing 175 (2016) 500–514

Table 9 Skull tissue segmentation results via different methods. Image name

Skull1

Skull2

Skull3

Skull4

MR images

GMM

SMM

NSMM

ACAP

FNSK

partitions. The value of the PR index ranges from zero to one and the higher the value is, the better the segmentation result is. Table 8 lists the PR index. Compared with GMM, SMM, and NSMM, we find that ACAP and the proposed method present the highest PR values, which indicates our methods obtain the best segmentation results. For a deep investigation of the performance for the current method, we repeat this experiment on another public database [42]: the whole brain atlas. The second row of Table 9

lists four original images MR-T2, with a size of 256  256 from an 81-year-old woman in excellent health, who participated in research on normal aging. The corresponding segmented images along with the original image are also presented in Table 9. In fact, similar conclusions can be drawn. From this table, one can see that the image segmentation results by standard GMM and SMM are not satisfactory, and there are some freckles in the segmentation results. NSMM obtains better results than GMM and SMM but are

H. Zhu, X. Pan / Neurocomputing 175 (2016) 500–514

1

1 GMM SMM NSMM ACAP FNSK

Segmentation Accuracy

0.9

0.9

0.85 0.8 0.75 0.7 0.65

0.85 0.8 0.75 0.7 0.65

0.6

0.6

0.55

0.55 0

2

4

6

8

GMM SMM NSMM ACAP FNSK

0.95

Segmentation Accuracy

0.95

0.5

0.5

10

0

2

Salt & Pepper Noise (%)

8

10

0.9

0.9

0.85 0.8 0.75 0.7 0.65

0.85 0.8 0.75 0.7 0.65

0.6

0.6

0.55

0.55 2

4

6

8

GMM SMM NSMM ACAP FNSK

0.95

Segmentation Accuracy

0.95

Segmentation Accuracy

6

1 GMM SMM NSMM ACAP FNSK

0

4

Salt & Pepper Noise (%)

1

0.5

511

10

Salt & Pepper Noise (%)

0.5

0

2

4

6

8

10

Salt & Pepper Noise (%)

Fig. 5. Comparison of the MR images at the various noise levels (PR index). (a) Skull1; (b) Skull2; (c) Skull3; and (d) Skull4. 4

Computation time in seconds

3.5

Skull1 Skull2 Skull3 Skull4

3 2.5 2 1.5 1 0.5 0

GMM

SMM

NSMM

ACAP

FNSK

Fig. 6. Computation time (in seconds) of different methods for each skull tissue.

worse than ACAP and FNSK. It is noted that ACAP and FNSK preserve more details and contours. Fig. 5 provides the comparative results of different algorithms with respect to the values of the PR index at various Salt & Pepper noise levels. The results reported in Fig. 5 and Table 9 further confirm that the proposed algorithm produces a more promising segmented image than GMM, SMM, NSMM, and ACAP. Fig. 6 shows the computation time for each of the aforementioned methods. All methods are initialized with the same initial condition (e.g. v¼1 for SMM, NSMM, and FNSK) and are performed on a PC (3.40-GHz Intel(R) Core(TM) i7-2600 processor with 4 GB RAM) until

convergence by using Matlab in the Windows environment. The standard GMM and SMM take the least amount of time for segmentation, while the fastest one is GMM. It can be seen that our method and ACAP are slower than others. This may be due to the application of the mean template. In the last experiment, two real MR images from the Internet Brain Segmentation Repository (ISBR07, 256  256 [43]) are used to test the effectiveness of our methods. The objective is to segmentation MR images into two labels: gray matter (GM) and white matter (WM). The ground truth of the original image is shown in the right-hand side of the second row of Table 10. For fair comparison, we also evaluate two typical spatial constraint models, i.e., FMM [30] and HMRF-FCM [25]. To evaluate the stability of each approach, we repeat the segmentation procedure 20 times using different initializations each time, and we compute the average PR index and standard derivation of each model. The segmentations of these images, obtained by the GMM, SMM, NSMM, ACAP, FMM, HMRF-FCM, and FNSK, setting the number of segments to 3, are presented in Table 10. In this example, the proposed method converges after 15 iterations, shown in Fig. 7. Table 11 contains the quantitative results (the average PR index) under the 5% Salt & Pepper noise condition, and the standard deviation for all methods. As can be seen from Table 11, the effects of noise on the final segmented images of GMM, SMM, FMM, and NSMM are high. The segmentation accuracy of GMM is relatively poor. SMM, FMM, and NSMM slightly improve the segmentations. The performance of FMM and NSMM is close to each other, and it is difficult to say which one is better. HMRF-FCM reduce the effect of noise significantly, this may be caused by the use of hidden Markov random field models where they provide a powerful way to introduce the information about the

512

H. Zhu, X. Pan / Neurocomputing 175 (2016) 500–514

Table 10 Skull tissue segmentation results via different methods. Image name

MRI1

MRI2

Image name

MR image

Ground truth

GMM

SMM

NSMM

FMM

ACAP

HMRF-FCM

MRI1

MRI2

FNSK

5

5

4.9

x 10

7.2

4.8

7 6.8

Objection Function

Objection Function

4.7 4.6 4.5 4.4 4.3

6.6 6.4 6.2 6

4.2

5.8

4.1 4

x 10

5.6 0

5

10

15

20

25

30

Iterations

0

5

10

15

20

25

30

Iterations

Fig. 7. Minimization of the objective function of the proposed algorithm, for the final experiment. (a) MRI1 and (b) MRI2.

mutual influences among the image pixels into the segmentation

better classify with more robust to this noise with the highest PR

procedure [25]. It is worth to point out that ACAP and FNSK seem to

index. The standard deviation reported in Table 12 indicates that the

preserve more image details and contours. This may be caused by the

stability of ACAP, HMRF-FCM, and FNSK outperforms the other

use of mean template. The proposed method, on the other hand, can

methods, as they have relatively lowest standard deviation.

H. Zhu, X. Pan / Neurocomputing 175 (2016) 500–514

513

Table 11 Comparison of different methods for ISBR07 MR image data set. Images

GMM

SMM

NSMM

ACAP

FMM

HMRF-FCM

FNSK

MRI1 MRI2

PR index 0.7862 0.7867

0.8089 0.8101

0.8174 0.8142

0.8206 0.8227

0.8126 0.8104

0.8324 0.8421

0.8439 0.8561

MRI1 MRI2

Standard deviation 0.0313 0.0283 0.0326 0.0306

0.0263 0.0250

0.0184 0.0173

0.0256 0.0272

0.0183 0.0170

0.0178 0.0169

Table 12 ISBR BrainWeb: segmentation accuracy for the 20 test images according to DSC (Mean 7 Standard deviation). Methods

WM

GMM SMM NSMM ACAP FMM HMRF-FCM FNSK

81.55 83.15 84.92 85.57 84.68 86.63 87.2691

n

GM 7 7 7 7 7 7 7

0.48n 0.39n 0.41n 0.36n 0.45n 0.31n 0.22

82.17 7 0.45n 82.66 7 0.41n 82.78 7 0.37n 84.92 7 0.35n 84.47 7 0.38n 85.13 7 0.32n 86.01 7 0.35

Values significantly different from FNSK. Here, paired t-test with 5% significance level.

89

88

88

87

87

86

DSC (%)

DSC (%)

86 85 84

85 84 83

83 82

82 81

White Matter Gray Matter

81

White Matter Gray Matter

80

80 GMM

SMM

NSMM

ACAP

FMM HMRF−FCMFNSK

GMM

SMM

NSMM

ACAP

FMM HMRF−FCMFNSK

Fig. 8. The average Dice similarity coefficient (%). (a) MRI1 and (b) MRI2.

An effective validation mechanism is an important part of any segmentation approach. Another most popular overlap-based metrics is Dice similarity coefficient [44] defined as follows: DSC ¼

2ðRM \ RA Þ RM þ RA

ð48Þ

where RM defines the manually segmented region and RA is the region obtained from the algorithm output. DSC takes values between 100 and 0, with 100 indicating perfect segmentation accuracy and 0 denoting no overlap at all. The average Dice similarity coefficient and standard deviation of this experiment are illustrated in Fig. 8. It can be noted from the Fig. 8 that the ACAP and HMRF-FCM have resulted in more accurate segmentations than the GMM, SMM, and NSMM. We thought that the cause might be the neighbourhood information prior terms which have significantly improved the results when compared to their counterparts that do now use such prior information. Another interesting observation from these results is that, for the FMM and GMM approach, we observed consistent and comparable results for GM segmentation, but performance degrades for WM. Finally, Fig. 8

illustrated that both our WM and GM segmentation results were consistently stable and highly accurate. The segmentation obtained by FNSK is compared to the other methods by paired t-test of their Dice overlap measures with the ground truth segmentations. Table 12 lists the average Dice overlap measures per tissue class and standard deviations, showing that the proposed approach performs superiorly to current stateof-the-art methods.

6. Conclusions In this paper, we proposed an effective fuzzy clustering method, which considered the relationship among neighbouring pixels by incorporating the nonsymmetrical Student's t-distribution and mean template into the fuzzy clustering procedure. This method utilizes a variant of the fuzzy clustering principle, regularized by KL information. By setting the dissimilarity function of the fuzzy clustering principle as the negative log-likelihood of the nonsymmetrical Student's t-distribution finite mixture model, the current method was quite flexible to fit

514

H. Zhu, X. Pan / Neurocomputing 175 (2016) 500–514

different shapes of observed data in noise free and noisy cases. In addition, when utilizing the mean template, the conditional probability of this method is different for each pixel, which depends on the neighbours of the pixel and their corresponding parameters. This strategy makes the proposed model more robust to noise. The model parameters are estimated by adopting the EM algorithm. Experiment results demonstrated that the proposed method has significantly improved the accuracy when compared to other related methods in terms of segmentation accuracy and robustness over synthetic and real MR images.

Acknowledgements The authors would like to thank the anonymous reviewers and the associate editor for their insightful comments that significantly improved the quality of this paper. This work was supported by the National Natural Science Foundation of China under Grant number 61371150.

References [1] X. Yang, X. Gao, D. Tao, X. Li, J. Li, An efficient MRF embedded level set method for image segmentation, IEEE Trans. Image Process. 24 (1) (2015) 9–21. [2] D. Mahapatr, J.M. Buhmann, Prostate MRI segmentation using learned semantic knowledge and graph cuts, IEEE Trans. Biomed. Eng. 61 (3) (2014) 756–764. [3] A. Ribbens, J. Hermans, F. Maes, D. Vandermeulen, P. Suetens, Unsupervised segmentation, clustering, and groupwise registration of heterogeneous populations of brain MR images, IEEE Trans. Med. Imaging 33 (2) (2014) 201–224. [4] W. Qiu, J. Yuan, E. Ukwatta, Y. Sun, M. Rajchl, A. Fenster, Prostate segmentation: an efficient convex optimization approach with axial symmetry using 3D TRUS and MR images, IEEE Trans. Med. Imaging 33 (4) (2014) 947–960. [5] N. Vadaparthi, S. Yarramalle, S.V. Penumatsa, P.S.R. Murthy, Segmentation of brain MR images based on finite skew gaussian mixture model with fuzzy cmeans clustering and EM algorithm, Int. J. Comput. Appl. 28 (10) (2011) 18–26. [6] Z.X. Ji, J.Y. Liu, G. Cao, Q.S. Sun, Q. Chen, Robust spatially constrained fuzzy cmeans algorithm for brain MR image segmentation, Pattern Recognit. 47 (2014) 2454–2466. [7] S. Shen, W. Sandham, M. Granat, A. Sterr, MRI fuzzy segmentation of brain tissue using neighborhood attraction with neural-network optimization, IEEE Trans. Inf. Tech. Biomed. 9 (2005) 459–467. [8] M.A. Balafar, A.R. Ramli, M.I. Saripan, R. Mahmud, S. Mashohor, Medical image segmentation using fuzzy c-mean (FCM), learning vector quantization (LVQ) and user interaction, Commun. Comput. Inf. Sci. 15 (2008) 177–184. [9] L. Liao, T.-S. Lin, A fast spatial constrained fuzzy kernel clustering algorithm for MRI brain image segmentation, in: Proceedings of the 2007 International Conference on Wavelet Analysis and Pattern Recognition, 2007, pp. 82–87. [10] D.-Q. Zhang, S.-C. Chen, A novel kernelized fuzzy c-means algorithm with application in medical image segmentation, Artif. Intell. Med. 32 (1) (2004) 37–50. [11] H. Zhang, Q.M.J. Wu, T.M. Nguyen, A robust fuzzy algorithm based on Student's t-distribution and mean template for image segmentation application, IEEE Signal Process. Lett. 20 (2) (2013) 117–120. [12] J. Yu, M.S. Yang, A generalized fuzzy clustering regularization model with optimality tests and model complexity analysis, IEEE Trans. Fuzzy Syst. 15 (2007) 904–915. [13] S. Nefti, M. Oussalah, U. Kaymak, A survey of the Marov random field method for image segmentation, IEEE Trans. Fuzzy Syst. 16 (2008) 145–161. [14] Y. Jiang, F.-L. Chung, S. Wang, Collaborative fuzzy clustering from multiple weighted views, IEEE Trans. Cybern. 45 (2015) 688–701. [15] Z. Ji, Y. Xia, Q. Sun, S.X.Q. Chen, D.D. Feng, Fuzzy local gaussian mixture model for brain MR image segmentation, IEEE Trans. Inf. Tech. Biomed. 16 (2012) 339–347. [16] Y.-M. Cha, J.-H. Han, High-accuracy retinal layer segmentation for optical coherence tomography using tracking kernels based on Gaussian mixture model, IEEE J. Sel. Top. Quantum Electron. 22 (2). [17] D. Peel, G. McLachlan, Robust mixture modeling using the t-distribution, Stat. Comput. 10 (2000) 335–344. [18] G. Sfikas, C. Nikou, N. Galatsanos, Robust image segmentation with mixtures of Student's t-distributions, in: Proceedings of the 2007 IEEE International Conference on Image Processing, IEEE, 2007, pp. 273–276. [19] T.M. Nguyen, Q.M.J. Wu, Fast and robust spatially constrained gaussian mixture model for image segmentation, IEEE Trans. Circuits Syst. Video Technol. 23 (2013) 621–635. [20] S. Kadoury, H. Labelle, N. Paragios, Spine segmentation in medical images using manifold embeddings and higher-order MRFs, IEEE Trans. Med. Imaging 32 (7) (2013) 1227–1238.

[21] X.-C. Li, S.-A. Zhu, A survey of the Marov random field method for image segmentation, J. Image Graph. 12 (2007) 789–798. [22] H. Zhang, Q.M.J. Wu, M.N. Thanh, Incorporating mean template into finite mixture model for image segmentation, IEEE Neural Netw. Learn. Syst. 24 (2013) 328–335. [23] H. Han, L. Li, C. Duan, H. Zhang, Y. Zhao, Z. Liang, A unified em approach to bladder wall segmentation with coupled level-set constraints, Med. Image Anal. 17 (2013) 1192–1205. [24] Y.Q. Feng, W.F. Chen, M.R. Brain, image segmentation using fuzzy clustering with spatial constraints based on Markov random field theory, Med. Imaging Augment. Real. (2004) 188–195. [25] S.P. Chatzis, T.A. Varvarigou, A fuzzy clustering approach toward hidden Markov random field models for enhanced spatially constrained image segmentation, IEEE Trans. Fuzzy Syst. 16 (2008) 1351–1361. [26] L. Wang, L. He, A. Mishra, C.M. Li, Active contours driven by local Gaussian distribution fitting energy, Signal Process. 89 (2009) 2435–2447. [27] T.M. Nguyen, Q.M.J. Wu, H. Zhang, A nonsymmetric mixture model for unsupervised image segmentation, IEEE Trans. Cybern. 43 (2013) 751–765. [28] T.M. Nguyen, Q.M.J. Wu, H. Zhang, Asymmetric mixture model with simultaneous feature selection and model detection, IEEE Trans. Neural Netw. Learn. Syst. 26 (2) (2014) 400–408. [29] S. Chatzis, T. Varvarigou, Robust fuzzy clustering using mixtures of Student's tdistributions, Pattern Recognit. Lett. 29 (2008) 1901–1905. [30] S.P. Chatzis, G. Tsechpenakis, A possibilistic clustering approach toward generative mixture models, Pattern Recognit. 45 (2012) 1819–1825. [31] S.P. Chatzis, A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional, Expert Syst. Appl. 38 (2011) 8684–8689. [32] S. Chatzis, A method for training finite mixture models under a fuzzy clustering principle, Fuzzy Sets Syst. 161 (2010) 3000–3013. [33] W. Rudin, Real and Complex Analysis, McGraw-Hill, New York, 1987. [34] C. Liu, D.B. Rubin, ML estimation of the t-distribution using EM and its extensions, ECM and ECME, Stat. Sin. 5 (1) (1995) 19–39. [35] D. Peel, G. McLachlan, Robust mixture modeling using the t-distribution, Stat. Comput. 10 (2000) 335–344. [36] S. Shoham, Robust clustering by deterministic agglomeration EM of mixtures of multivariate t distributions, Pattern Recognit. 35 (2002) 1127–1142. [37] J.C. Bezdek, N.R. Pal, J. Keller, R. Krisnapuram, Fuzzy Models and Algorithms for Pattern Recognition and Image Processing, Kluwer Academic Publishers, Norwell, MA, USA, 1999. [38] 〈http://brainweb.bic.mni.mcgill.ca/brainweb/〉. [39] Y. Zhang, M. Brady, S. Smith, Segmentation of brain MR images through a hidden Markov random field model and the expectation maximization algorithm, IEEE Trans. Med. Imaging 20 (2001) 45–57. [40] 〈http://www.mr-tip.com/serv1.php?type〉. [41] R. Unnikrishnan, C. Pantofaru, M. Hebert, A measure for objective evaluation of image segmentation algorithms, Comput. Vis. Pattern Recognit. 24 (2005) 1–34. [42] 〈http://www.med.harvard.edu/aanlib/home.html〉. [43] 〈http://www.nitrc.org/projects/ibsr/〉. [44] L.R. Dice, Measures of the amount of ecologic association between species, Ecology 26 (1945) 297–302.

Hongqing Zhu received the Ph.D. degree from Shanghai Jiao Tong University, Shanghai, China, in 2000. From 2003 to 2005, she was a Post-Doctoral Fellow with the Department of Biology and Medical Engineering, Southeast University, Nanjing, China. She is currently a Professor with the East China University of Science and Technology, Shanghai. Her current research interests include signal processing, image reconstruction, image segmentation, image compression, and pattern recognition. She is a member of IEEE and IEICE.

Xu Pan is currently pursuing the Ph.D. degree with the Department of Electronics and Communication Engineering, East China University of Science and Technology, Shanghai, China, where he received the B.S. degree from the School of Information Science and Engineering, in 2013. His research interests are pattern recognition and computer vision.