Robust generative asymmetric GMM for brain MR image segmentation

Robust generative asymmetric GMM for brain MR image segmentation

Computer Methods and Programs in Biomedicine 151 (2017) 123–138 Contents lists available at ScienceDirect Computer Methods and Programs in Biomedici...

5MB Sizes 1 Downloads 98 Views

Computer Methods and Programs in Biomedicine 151 (2017) 123–138

Contents lists available at ScienceDirect

Computer Methods and Programs in Biomedicine journal homepage: www.elsevier.com/locate/cmpb

Robust generative asymmetric GMM for brain MR image segmentation Zexuan Ji a,∗, Yong Xia b,c,∗, Yuhui Zheng d a

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China Shaanxi Key Lab of Speech and Image Information Processing (SAIIP), School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072, China c Centre for Multidisciplinary Convergence Computing (CMCC), School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072, China d School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, 210044, China b

a r t i c l e

i n f o

Article history: Received 8 December 2016 Revised 4 August 2017 Accepted 21 August 2017

Keywords: Gaussian mixture model Asymmetric distribution Expectation-maximization algorithm Brain MR image segmentation Markov random fields Intensity inhomogeneity

a b s t r a c t Background and objectives: Accurate segmentation of brain tissues from magnetic resonance (MR) images based on the unsupervised statistical models such as Gaussian mixture model (GMM) has been widely studied during last decades. However, most GMM based segmentation methods suffer from limited accuracy due to the influences of noise and intensity inhomogeneity in brain MR images. To further improve the accuracy for brain MR image segmentation, this paper presents a Robust Generative Asymmetric GMM (RGAGMM) for simultaneous brain MR image segmentation and intensity inhomogeneity correction. Method: First, we develop an asymmetric distribution to fit the data shapes, and thus construct a spatial constrained asymmetric model. Then, we incorporate two pseudo-likelihood quantities and bias field estimation into the model’s log-likelihood, aiming to exploit the neighboring priors of within-cluster and between-cluster and to alleviate the impact of intensity inhomogeneity, respectively. Finally, an expectation maximization algorithm is derived to iteratively maximize the approximation of the data loglikelihood function to overcome the intensity inhomogeneity in the image and segment the brain MR images simultaneously. Results: To demonstrate the performances of the proposed algorithm, we first applied the proposed algorithm to a synthetic brain MR image to show the intermediate illustrations and the estimated distribution of the proposed algorithm. The next group of experiments is carried out in clinical 3T-weighted brain MR images which contain quite serious intensity inhomogeneity and noise. Then we quantitatively compare our algorithm to state-of-the-art segmentation approaches by using Dice coefficient (DC) on benchmark images obtained from IBSR and BrainWeb with different level of noise and intensity inhomogeneity. The comparison results on various brain MR images demonstrate the superior performances of the proposed algorithm in dealing with the noise and intensity inhomogeneity. Conclusion: In this paper, the RGAGMM algorithm is proposed which can simply and efficiently incorporate spatial constraints into an EM framework to simultaneously segment brain MR images and estimate the intensity inhomogeneity. The proposed algorithm is flexible to fit the data shapes, and can simultaneously overcome the influence of noise and intensity inhomogeneity, and hence is capable of improving over 5% segmentation accuracy comparing with several state-of-the-art algorithms. © 2017 Elsevier B.V. All rights reserved.

1. Introduction As one of the classical problems in image processing, image segmentation has been extensively studied, which can be treated as a classification problem [1–4] for the target image. Magnetic resonance (MR) imaging has been widely used in medical diagnosis because of the multi-spectral, high contrast and high spatial resolution characteristics. Automated segmentation of brain tissues



Corresponding authors. E-mail addresses: [email protected] (Z. Ji), [email protected] (Y. Xia).

http://dx.doi.org/10.1016/j.cmpb.2017.08.017 0169-2607/© 2017 Elsevier B.V. All rights reserved.

into the gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF) is an essential step in quantitative brain image analysis. However, MR images generally suffer from the intensity inhomogeneity [5], and contain a smoothly varying bias field, which makes the intensities of the same tissue vary across pixel/voxel locations. Hence, simultaneous bias field estimation and brain MR image segmentation algorithms have been widely developed during the last decades [6–11]. Among them, the statistical modelbased [12–22] algorithm is the one of the most popular models. Gaussian mixture models (GMM) [12] has been widely utilized in image segmentation because it is simple and easy to be im-

124

Z. Ji et al. / Computer Methods and Programs in Biomedicine 151 (2017) 123–138

plemented, where the expectation maximization (EM) algorithm [23] is generally utilized to efficiently estimate the involved parameters. Wells et al. [13] proposed an adaptive segmentation framework based on the EM algorithm to simultaneously correct intensity inhomogeneity and segment brain MR images. Leemput et al. [14] furthered this work by utilizing a digital brain atlas to provide the prior probability maps of brain tissues. However, due to the Gaussian assumption of each type of tissue pixels and the lack of using spatial information, GMM suffers from less flexibility to fit the shape of data and the sensitivity to noise. To address these drawbacks, mixture models with Markov random fields (MRF) or hidden MRF (HMRF) have been frequently employed, which can be generally categorized into two groups. In the first group [24–26], the prior distribution is calculated on a pixelby-pixel basis, depending on each pixel’s label and its neighboring pixels [27]. The other group of mixture models utilizes MRF to model the joint prior distribution of pixel labels [27–29]. Diplaros et al. [27] introduced a pseudo-likelihood quantity to incorporate the spatial smoothness constrains into the model, and thus proposed a generative GMM. Nikou et al. [28] proposed a spatial constraint which can adaptively select the spatial directions. To improve the efficiency of MRF-EM based algorithms, Nguyen and Wu [29] proposed a fast and robust spatially constrained GMM by introducing a spatial factor into the prior distribution. Although these algorithms can reduce the impact of noise, most MRF or HMRF based algorithms are still not robust enough with respect to different types and levels of noise and have high computational complexity. In many practical applications, it is not good enough to fit different shapes of observed data by only utilizing one statistical distribution for each component of a mixture model. Recently, the mixture of mixture models and asymmetric mixture model has been widely studied. Zhang et al. [30] modified the GMM by constructing the conditional probability of each pixel with the probabilities of pixels in its immediate neighborhood. Browne et al. [31] combined a multivariate Gaussian distribution and a multivariate uniform distribution as the component density, which allows for the bursts of probability, locally higher tails or both [31]. Nguyen and coworkers [22,32,33] proposed various bounded asymmetric mixture models (AMM) by modeling each component of a mixture model with multivariate bounded Gaussian/Student’s-t distribution. However, AMMs are still sensitive to the noise. In this paper, we propose a novel robust generative asymmetric GMM for brain MR image segmentation. We first modify the asymmetric distribution by incorporating the spatial information into the asymmetric mixture model. Then we extend the work by Diplaros et al. [27] and similarly introduce two pseudo-likelihood quantities to couple neighboring priors of within- and betweencluster based on the Kullback–Leibler (KL) divergence [34], respectively. Meanwhile, an asymmetric weight factor is constructed to further reflect the damping extent of the neighbors. Next, we integrate the bias field estimation model into the log-likelihood function to estimate and correct the intensity inhomogeneity. Finally, an EM algorithm is derived to estimate parameters of the proposed model by iteratively maximizing the approximation of the data log-likelihood function. The proposed algorithm has been evaluated against state-of-the-art segmentation algorithms on various brain MR images. The partial pilot data has been presented in our previous work which proposed a spatially constrained asymmetric Gaussian mixture for image segmentation [35]. All the notations and symbols used in this paper have been summarized in Table 1. 2. Background Let an image with N pixels be denoted by X = {xi ,i = 1, 2, …, N}, where xi ∈ RD is the observation at the ith pixel. Suppose there are

Fig. 1. Probabilistic graphical models [31].

K target regions in the image, and the density function of a finite mixture model is given by

f (xi |,  ) =

K 

πk p(xi |k ),

(1)

k=1

where k is the class labels of the kth region, and  = {π k },k = {1, 2, …, K} is mixture parameters, which satisfies the constraints 0 ≤  π k ≤ 1 and Kk=1 πk = 1. In GMM [12], p(xi |k ) is the Gaussian distribution (xi |θ k ),θ k = {μk , k }:

(xi |θk ) =



1

( 2π )

D/2

1/2

|k |



1 exp − (xi − μk )T k−1 (xi − μk ) , 2

where μk is the mean,  k is the covariance matrix, | k | is the determinant of  k , and θ k = {μk , k } is the assembly of statistical parameters. Fig. 1(a) [10] shows the generative model for GMM, where zi is hidden random variable that indicates the belonging of each pixel xi . In AMM [22,32,33], an asymmetric distribution p(xi |k ) is defined to model the label k as:

p(xi |k ) =

L 

ψkl p(xi |kl ),

l=1

where L is the number of multivariate distribution p(xi |kl ) and ψ kl is the prior probability of  within-cluster which satisfies the L constraints 0 ≤ ψ kl ≤ 1 and l=1 ψkl = 1. Fig. 1(b) shows the probabilistic graphical model for AMM. To overcome the influence of noise, the spatial information generally incorporated by utilizing the MRF distribution among prior values

p() ∝ exp {−U ()}, where U() is the energy function to smooth the priors and is generally utilized to incorporate spatial correlations into the segmentation process. Based on the type of energy U(), we can obtain different models [23]. Based on the Bayes’ rule, the posterior probability density function can be written as

p(, |X ) ∝ p(X |,  ) p(). The graphical model for MRF–GMM algorithms is shown in Fig. 1(c). 3. Theory Motivated by the BAMM algorithm [32], we construct a novel asymmetric Gaussian mixture model to fit data shapes and make use of the spatial information, where the density function f(xi |, , ) for each pixel xi is defined as

f (xi |, ,  ) =

K  k=1



πik

L  l=1



ψikl (xi |μkl , kl ) ,

(2)

Z. Ji et al. / Computer Methods and Programs in Biomedicine 151 (2017) 123–138

125

Table 1 The summarizations for all the notations and symbols used in this paper. Notations

Explanations

Notations

Explanations

D N X K L

The dimension of each pixel The number of pixels in the image The target image The number of clusters/component The number of multivariate distributions The set of class labels The set of between-cluster prior distributions The set of within-cluster prior distributions Gaussian distribution The mean value of Gaussian distribution The covariance matrix of Gaussian distribution The assembly of model parameters The bias field coefficient The true image without corruptions The number of polynomials The set of orthogonal polynomial basis The degree of those polynomials The density function The probability function

U(•) D(•) H(•) x

The The The The The The The The The The The The The The The The The The The

 

 μ   B J M G P f(•) p(•)

where  = {π ik } is the set of between-cluster prior distributions that characterizes the prior probability of pixel xi belonging to  k and satisfies the constraints 0 ≤ π ik ≤ 1 and Kk=1 πik = 1,

= {ψ ikl } is the set of within-cluster prior distributions that represents the prior probability of pixel xi belonging to kl and satis fies the constraints 0 ≤ ψ ikl ≤ 1 and Ll=1 ψikl = 1,  denotes the assembly of model parameters, and (xi |θ kl ) with θ kl = {μkl , kl } is a Gaussian distribution. Comparing with the density functions of GMM in Eq. (1) and AMM in Eq. (2), the major difference of the proposed density function lies in the construction of prior distributions. In the proposed algorithm, we extend the between-cluster prior distributions from π k in GMM and AMM to π ik , and extend the within-cluster prior distributions from ψ kl in AMM to ψ ikl . This extension means that the prior distributions are varying for each pixel in each cluster. Therefore, we can easily exploit the spatial information through the proposed prior distributions by utilizing the MRFs. By assuming that the observation xi is statistically independent to each other, the joint conditional density over the whole observed data set can be written as

p(X |, ,  ) =

N 

f (xi |, ,  ).

δ λ s q z y n

θ i k l j

α β m

function to approximate and correct the intensity inhomogeneity through the pixel-by-pixel level by using the linear combination of orthogonal polynomials (Section 3.3). Finally, the EM algorithm is derived to estimate parameters of the proposed model by iteratively maximizing the approximation of the data log-likelihood function (Section 3.4). 3.1. Model approximation To further simplify the scheme for incorporating spatial constraints in an EM framework, as in [27,38], we model the joint density of priors by employing the Besag approximation

⎧ N   

⎪ ⎨ p() ≈ p πi πδi i=1 N   , ⎪ ⎩ p( ) ≈ p ψi ψδi i=1

where δ i is the neighborhood of pixel i, πδi and ψδi are the neighboring prior probabilities of the between-cluster and within-cluster priors, respectively. Both prior probabilities are defined as:

πδi =

i=1

Based on the Bayes’ rule, the posterior probability density function is

p(, , |X ) ∝ p(X |, ,  ) p() p( ). The main innovation of the proposed algorithm is the introduction of spatially varying, pixel-by-pixel component weights and within-cluster weights. By utilizing the similar strategy in the work by Diplaros et al. [27], the field constraints are directly enforced over the parameters of the label priors, resulting in a smoother objective function (Section 3.1). Because the above operation would lead to an explosion in the number of parameters, couplings asymmetric weight factors need to be introduced to prevent oversegmentations of the model and encourage the spatial continuity for both the component and within-component probabilities. The pixel-wise probabilities are spatially coupled by linking them to the convex combination of the probabilities of the neighboring pixels, which is enforced via the cross-entropy between the pixel-wise weights and the combined weights from the neighborhood (Section 3.2). Next, considering the intensity inhomogeneity in the brain MR images, the bias field estimation model proposed by Li and coworkers [36,37] is introduced into the log-likelihood

energy function in MRF KL divergence function entropy function observation of the pixel pixel set of neighborhood weight for neighboring pixels auxiliary set of distribution auxiliary class distributions between-cluster posterior probabilities within-cluster posterior probabilities additive noise parameter assembly for each distribution ith pixel in the image kth cluster lth multivariate distribution jth pixel in the neighborhood parameter of the MRF priors parameter of the MRF priors auxiliary indicators



j∈δi j=i

λi j π j

ψδi =



j∈δi j=i

λi j ψ j

,

where λij is positive weight for each neighboring pixel j and satis fies j ∈δi , j =i λi j = 1. Similar with the work by Diplaros et al. [27], we approximate the log-model of the conditional densities p(πi |πδi ) and p(ψi |ψδi ) in the given form (ignoring constants)

log p(πi log p(ψi

| πδi ) = −α [D(πi  πδi ) + H (πi )] | ψδi ) = −β [D(ψi  ψδi ) + H (ψi )],

(3)

where D(πi πδi ) is the KL divergence for the between-cluster priors and becomes zero when πi = πδi , D(ψi ψδi ) is the KL divergence for the within-cluster priors and becomes zero when ψi = ψδi , H(π i ) is the entropy of the between-cluster priors π i , and H(ψ i ) is the entropy of within-cluster priors ψ i . Then we similarly utilize two auxiliary sets of distribution si(1 ) and si(2 ) to approximate the log-conditional probabilities p(πi |πδi ) and p(ψi |ψδi ) [24] as follows:

           πi πδi , si(1) ≈ −α D si(1) πi + D si(1) πδi + H si(1)            log p ψi ψδ , s(2) ≈ −β D s(2) ψi + D s(2) ψδ + H s(2) log p

i

i

i

i

i

i

(4)

126

Z. Ji et al. / Computer Methods and Programs in Biomedicine 151 (2017) 123–138

Note that the above approximations are equal to Eq. (3) when si(1 ) = πi and si(2 ) = ψi . Eq. (4) introduces the spatial auxiliaries by lower bounding prior conditional densities. In [27,39,40], the constraints are incorporated by lower bounding posterior distributions with penalty terms based on a KL distance. Motivated by this spirit, we introduce additional penalty terms based on the posterior distributions as follows

It is clear that the factor λij can be determined without setting any artificial parameters. The proposed weight factor reflects the damping extent of the neighbors with the spatial distance from other neighborhood pixels [43].

− 14 D qi(1) zi + D qi(1) zδ

To estimate the bias field in the image and overcome the impact of intensity inhomogeneity, Li and coworkers proposed a novel bias filed estimation algorithm [36,37] by utilizing a multiplicative component of the observed image to model the intensity inhomogeneity in a brain MR image:

 

 





 

 







i

− 14 D qi(2) yi + D qi(2) yδ

i









,

+ H qi(1)

+ H qi(2)

where the coefficient 1/4 is used to make the M-step tractable, qi(1 )

and qi(2 ) are two arbitrary class distributions for pixel i, and the posterior probabilities over zi and yi can be described as: 

πik Ll=1 ψikl (xi |μkl , kl ) πik p(xi |k ) = K L π p ( x |  ) m i m=1 im m=1 [πim l=1 ψiml (xi |μml , ml )] ψ p(xi |kl ) ψ (xi |μkl , kl ) ≡ L ikl = L ikl . m=1 ψikm p(xi |km ) m=1 ψikm (xi |μkm , km )

zik ≡ K yikl

(5)

 For writing convenience, let p(xi |k ) = Ll=1 ψikl p(xi |kl ) and p(xi |kl ) = (xi |μkl , kl ). The values zik and yikl always satisfy K L the conditions k=1 zik = 1 and l=1 yikl = 1, respectively. Consequently, the probabilistic graphical model of the proposed algorithm can be described in Fig. 1(d). Putting all terms together, the log-likelihood of the proposed model can be written as

L(, , |X ) =

N 





log

i=1

K 



πik p(xi |k )

k=1

      1   (1 ) D qi zi + D qi(1) zδi + H qi(1) 4       1  − D qi(2) yi + D qi(2) yδi + H qi(2) 4 −



To further reduce the over-smoothness segmentations, in this paper, we proposed a new asymmetric weight factor by incorporating both the spatial information and pixel intensity value information. Since the weight factor λij are positive and satisfies  the constraint j ∈δi , j =i λi j = 1, construction of λ reflects the strategy of spatial incorporations and the ability of noise robustness. A straightforward construction for the weight factor is to use a symmetric filter, such as the modified (i.e., the center coefficient set to zero) average filter and pillbox filter [27]. However, without considering any pixel intensity information, the above mentioned weight factors would lead the corresponding algorithms to oversmoothness segmentations. Therefore, it would be a better choice if we construct an asymmetric weight factor by incorporating both the spatial information and pixel intensity value information like the weight factors in [30,41,42]. Different from our previous work [43], the weight factor is obtained by setting zero coefficient for the neighbor center and nonnegative coefficients elsewhere. Specifically, let R be the radius of the neighborhood δ i of the ith pixel. The novel weight factor can be calculated as

 

λi j = 

h ∈ δi

m ∈ δi m=i, j

 

exp −

 |x j − xm |/(Nδi × (Nδi − 1 ) )

m ∈ δi m=i, j

. |xh − xm |/(Nδi × (Nδi − 1 ) )

bi =

M 

ωk gk (i ) = wT G(i ),

k=1

where M = (P + 1)(P + 2)/2 is the number of polynomials gk (i), wT G(i) gives a linear combination of a set of orthogonal polynomial basis G(i), w = {ωk ∈ , k = 1, , M} are real-valued combination coefficients, and P is the degree of those polynomials [36,37]. Therefore, the asymmetric distribution p(xi |k ) can be rewritten as

=

3.2. Construction of weight factor

exp −

where X is the observed image, J is the true image, B is the bias field coefficient, and n is the additive noise. To estimate the bias field B from the brain MR image X, we utilized the bias field estimation method in [36,37] by assuming that the bias field B = {bi ,i = 1, 2, …, N} can be approximated through the pixel-bypixel level by using the linear combination of orthogonal polynomials:

L 

ψkl p(xi |kl )

l=1

    − β D si(2) ψi + D si(2) ψδi + H si(2) . 

X = B · J + n,

p(xi |k ) =

        −α D si(1) πi + D si(1) πδi + H si(1)  

3.3. Estimation of intensity inhomogeneity

(6)

L 



1

ψikl

1

(2π )D/2 |kl |1/2 l=1  kl−1 (xi − bi μkl ) .

 exp



1 (xi − bi μkl )T 2

Finally, the penalized log-likelihood of the objective function for brain image segmentation can be defined as

L(, , , B|X ) =

N 



i=1

log



K 

 πik p(xi |k )

k=1

      1   (1 )  D qi  p(i 1) + D qi(1)  p(δ1) + H qi(1) i 4        1  − D qi(2)  p(i 2) + D qi(2)  p(δ2) + H qi(2) i 4 −

        −α D si(1) πi + D si(1) πδi + H si(1)

        − β D si(2) ψi + D si(2) ψδi + H si(2) . (7)

3.4. Parameter estimation The energy L(, ,,B|X) in Eq. (7) can be maximized by adopting the EM algorithm. In the E-step, we maximize L(, ,,B|X) over s and q by fixing , , and B. Take si(1) for example, by fixing , , and B, we can optimize over si(1 ) for each pixel i. The terms involving si(1 ) in

Z. Ji et al. / Computer Methods and Programs in Biomedicine 151 (2017) 123–138

the energy function L(, ,,B|X) are



Note the M × M matrix A has been proved be nonsingular [36,37]. Please refer to the Appendix 1 for the detailed derivation of the updating functions. Consequently, the proposed algorithm can be summarized in Algorithm 1.

K K K    (1 ) (1 ) (1 ) (1 ) (1 ) sik log sik + sik log πik − sik log sik k=1

k=1

k=1

K K   (1 ) (1 ) (1 ) + sik log πδi k + sik log sik k=1

4. Results

k=1

=−

4.1. Materials and methods

K K     (1 ) (1 ) (1 ) sik log sik + sik log πik πδi k . k=1

k=1

Finally, we can get: (1 ) sik ∝ πik πδi k (1 ) qik ∝ zik zδi k

(2 ) sikl ∝ ψikl ψδi kl (2 ) qikl ∝ yikl yδi kl .

s(1) ,

(8) s(2) ,

q(1)

q(2) ,

In the M-step, by fixing and we can maximize L(, ,,B|X) over , , and B. The terms involving , , and B are



N 



log

i=1

K 



πik p(xi |k )

⎤        1⎢  ⎥ − ⎣D qi(1)  p(i 1) + D q(j1)  p(δ1) ⎦ j 4 j∈δi j=i

⎡ −

 1 ⎢  (2 )  D qi  p(i 2) + ⎣ 4

⎤    ⎥ D q(j2)  p(δ2) ⎦ j

  j∈δi j=i



⎤    (1)  ⎥ ⎢  −α ⎣D si(1) πi + D s j  πδ j ⎦ j∈δi j=i

⎤⎫ ⎪    (2)  ⎥⎬ ⎢  (2 )  − β ⎣ D s i ψ i + D s j ψδ j ⎦ . ⎪ ⎭ j∈δi ⎡

j=i

Finally, we can obtain:

πik = ψikl =



N



i=1

i=1

N

i=1

1 4



(9)



 (2 ) (2 )   (2 ) qikl + qδ(2kl) + β sikl + sδ kl , i

(1 ) qik + qδ(1k)

μkl = N 

kl =



 (1 ) (1 )   1 1  (1 ) q + qδ(1k) + α sik + sδ k , i i 1 + 4α 4 ik 1 1 + 4β



i

(1 ) qik + qδ(1k) i

(1 ) qik + qδ(1k) i N



i



(2 ) qikl + qδ(2kl) b2i

,

(11)



(2 ) qikl + qδ(2kl) (xi − bi μkl )(xi − bi μkl ) i

 (1 )

(1 ) qik + qδ k i

T

 (2 )

(2 ) qikl + qδ kl i

(12) w(t+1) = A−1 B,

(13)

where

⎧ N  ⎪ ⎨A = G(i )G(i )T J1 (i )

B=

K  L  ⎪ ⎩J1 (i ) = zik yikl μTkl kl−1 μkl

J2 (i ) =

i=1

k=1 l=1

The value of DC ranges from 0 to 1, with a higher value representing a more accurate segmentation result. We evaluated the proposed brain MR image segmentation algorithm and compared it to eight state-of-the-art algorithms. The algorithms used in our comparative experiments can be categorized into two groups. The spatially constrained generative model (SCGM) [27], a fast and robust spatially constrained GMM (FRSCGMM) [29], a Bayesian bounded asymmetric mixture model (BAMM) [32] and a bounded generalized GMM (BGGMM) [46] have a proven ability to minimize the impact of noise; whereas the automated model based on EM algorithm (AM-EM) [14], coherent local intensity clustering (CLIC) [9], adaptive scale fuzzy local GMM (AS-FLGMM) [47] and multiplicative intrinsic component optimization (MICO) [37] can simultaneously correct the bias field and segment brain MR images. Unless otherwise specified, the parameters of the proposed algorithm are set as follows. The size of window for prior factor construction is 3 × 3. The number of multivariate Gaussian distribution is set to L = 3. The parameters α and β of the MRF priors are set as α = β = 0.25. For a fair comparison, all these algorithms adopted the same initialization generated by the k-means algorithm. All algorithms were implemented using the Matlab (Version 7.8) platform and were tested on a PC (Intel Core i7-4790 CPU, 3.60 GHz, 16GB RAM, and 64-bit Windows 8). 4.2. Performance of proposed algorithm



i



i=1

(10)

i

(2 ) qikl + qδ(2kl) xi bi



In this paper, we utilized two benchmark datasets to evaluate the proposed algorithms, including the synthetic brain MR image database BrainWeb [44]1 and real brain MR image database IBSR v2.0 [45].2 It should be noted that all the testing images used in this paper are gray images, and we directly utilized the intensity values of each pixel as the feature, which means that the dimension of the pixel feature is D = 1. The segmentation accuracy for each brain tissue type is quantitatively evaluated by using the Dice coefficient (DC) [5], which is the ratio between the intersection and union of the segmented volume S1 and ground truth volume S2

DC (S1 , S2 ) = (2|S1 ∩ S2 | )/(|S1 | + |S2 | ).

k=1



127

N  i=1

G(i )J2 (i ) K  L  k=1 l=1

zik yikl xTi kl−1 μkl

.

We first applied the proposed algorithm to the 80-th slice of the synthetic brain MR image with 7% noise and 100% bias field selected from BrainWeb. The to-be-segmented image, intermediate results and final output of the proposed algorithm were shown in Fig. 2. The estimated bias field and bias corrected image were shown in the left part of intermediate illustrations. The corresponding within-cluster posterior probabilities {zik , i = 1, 2, …, N, k = 1, 2, 3} and between-cluster posterior probabilities {yikl , i = 1, 2, …, N, k = 1, 2, 3, l = 1, 2, 3} were depicted in the middle and right part of intermediate illustrations, respectively. It is clear that the between-cluster posterior probabilityyikl can provide different over-segmentations and under-segmentations, for each brain tissue type. Therefore, the proposed algorithm can further improve the accuracy of posterior probability estimation by combining these posterior probabilities with the automatically computed 1

(14)

2

Available at: http://www.bic.mni.mcgill.ca/brainweb. Available at: http://www.cma.mgh.harvard.edu/ibsr.

128

Z. Ji et al. / Computer Methods and Programs in Biomedicine 151 (2017) 123–138

Fig. 2. Illustrations of the proposed algorithm.

Fig. 3. Left: the histogram of the original noisy image in Fig. 2. Middle: the histogram of bias corrected image (blue region) and the density function of the proposed algorithm (red line). Right: ground truth distributions for each tissue and the estimated distributions of each cluster. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

prior probabilities of between-cluster. Comparing the segmentation with the ground truth, we can observe that the segmentation result of our algorithm is consistent with the ground truth, which means that the proposed algorithm can effectively deal with the noise and intensity inhomogeneity, and hence produce the accurate result. We plotted the histogram of the observed brain MR image, bias field corrected image and each brain tissue type in the ground truth in Fig. 3. To demonstrate the proposed algorithm’s ability to estimate the pixel value distribution, we overlaid the estimated distribution, which is highlighted in red, on the histogram of the bias corrected image, shown as in the middle column of Fig. 3. It reveals that the proposed algorithm can get satisfying estimation of the pixel distribution on the bias corrected image. Then, we performed the second experiment on three clinical 3T brain MR images with skulls. Fig. 4 shows the testing images, estimated bias field, bias corrected images and segmentation results from left to right, respectively. It is evident that the proposed algorithm can achieve satisfying performance even without skull stripping. 4.3. Comparison on BrainWeb images To demonstrate its robustness to noise, the proposed algorithm was compared to the BAMM, BGGMM, SCGM and FRSCGMM algorithms, each of which has a proven ability to overcome the impact of noise in the brain MR images. The T1-weighted synthetic brain MR images with a noise level ranging from 3% to 9% were

selected from BrainWeb as test cases. Fig. 5 shows the 85th axial, sagittal and coronal slice of the MR study with 9% noise, corresponding ground truth and the segmentation results obtained by using those five algorithms. For a better visual comparison, a rectangular region highlighted by a red box in each image was enlarged and displayed below the image. It seems that both BAMM and BGGMM are relatively sensitive to the noise due to the lack of spatial constraints. Both SCGM and FRSCGMM, though constructing the spatial constrains based on the prior and posterior probability, utilize only a symmetric smoothing filter over the whole image, and hence may result in over-smoothed segmentation results, especially in the CSF region. Comparing to these four methods, the proposed algorithm can produce better segmentation results that are visually more similar to the ground truth. Next, we applied each of those five algorithms to brain MR images with different levels of noise. For each noise level, we employed 20 axial, 20 sagittal and 20 coronal slices as test images (from the 81th slice to the 100th slice). The means and standard deviation of DC values that characterize the accuracy of segmenting each type of tissue were depicted in Fig. 6. It reveals that the results produced by the proposed algorithm have higher average and lower standard deviations of DC values than those produced by other methods. This experiment demonstrate that the proposed algorithm produces the most accurate segmentation and its performance is the most robust. It worth mentioning that the DC values of CSF segmentation produced by BAMM and BGGMM are slightly higher than those of other algorithms on images with less than

Z. Ji et al. / Computer Methods and Programs in Biomedicine 151 (2017) 123–138

129

Fig. 4. Segmentation results on three 3T-weighted brain MR images with skull. Column 1: original images, Column 2: estimated bias fields, Column 3: bias-corrected images, Column 4: segmentation results of the proposed algorithm.

Fig. 5. The 85th axial (top), sagittal (middle) and coronal (bottom) slice of a BrainWeb study with 9% noise (1st column), the segmentation results obtained by using (2nd column) the BAMM, (3rd column) BGGMM, (4th column) SCGM, (5th column) FRSCGMM and (6th column) proposed algorithm, and (7th column) the segmentation ground truth. The picture below each image is the enlarged view of the rectangles region marked by a red box. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

7% noise. The reason may lie in the fact that incorporating spatial information into the segmentation process must be doubleedged. On one side it can improve the accuracy of segmenting noise corrupted images, whereas on the other side it may also lead to blurred texture details in an image. The BAMM and BGGMM algorithms do not use the spatial information, and hence are more

accurate on images with light noise, but less accurate on images with heavy noise. On the contrary, due to the use of spatial information, SCGM and FRSCGMM perform better on images with heavy noise, but worse on images with light noise. Comparing to them, the proposed algorithm reaches a better trade-off between resist-

130

Z. Ji et al. / Computer Methods and Programs in Biomedicine 151 (2017) 123–138

Fig. 6. Mean and standard deviation of DC values of (left) GM, (middle) WM, and (right) CSF obtained by applying five algorithms to brain MR images with different noise levels.

Fig. 7. Three clinical (1st column) brain MR images and the segmentation results obtained by applying (2nd column) the AM-EM, (3rd column) CLIC, (4th column) ASFLGMM, (5th column) MICO and (6th column) proposed algorithm, and (7th column) the ground truth.

ing noise and keeping texture details, and therefore achieves the highest segmentation accuracy on most test images. To demonstrate its improved performance in simultaneous bias field correction and segmentation, the proposed algorithm was evaluated against the AM-EM, CLIC, AS-FLGMM and MICO methods. Fig. 7 shows three clinical brain MR images, which contains obvious intensity inhomogeneity and low contrast, and the segmentation results. Although utilizing the brain atlas as the prior probability map of WM, GM and CSF, the AM-EM algorithm remains sensitive to initialization, which is usually obtained by using the K-means clustering of pixels. Both the CLIC and the ASFLGMM algorithms attempt to overcome the impact of intensity inhomogeneity by modelling the intensity information within each local region. However, these two algorithms are not robust to parameters settings and may lead to mis-segmentation, especially in boundary regions. The MICO algorithm can approximate the bias field at the pixel-by-pixel level by using a linear combination of orthogonal polynomials, but has a poor ability to distinguish the

regions with low contrast. Comparing to these algorithms, the proposed algorithm can better deal with the intensity inhomogeneity and low contrast and is capable of producing more accurate segmentation results on clinical images. These five algorithms were also applied on 20 brain MR images selected from BrainWeb (from the 81th slice to the 100th slice), which contain 5% noise and the intensity inhomogeneity ranges from 60% to 100%. The mean and standard deviation of DC values for the segmentation of GM, WM and CSF were illustrated in Fig. 8. This quantitative evaluation demonstrates again the superior performance of the proposed algorithm in dealing with the intensity inhomogeneity. 4.4. Comparison on IBSR images In this experiment, we applied the proposed algorithm and other eight segmentation algorithms to the IBSR v2.0 dataset, which contains 18 T1-weighted clinical brain MR volumes. Fig. 9

Z. Ji et al. / Computer Methods and Programs in Biomedicine 151 (2017) 123–138

131

Fig. 8. Mean and standard deviation of DC values for the segmentation of (left) GM, (middle) WM and (right) CSF obtained by applying comparison algorithms to brain MR images with different levels of intensity inhomogeneity.

Fig. 9. 3D slice view of clinical brain MR study “IBSR07”, corresponding ground truth and segmentation results produced by the proposed algorithm and eight other approaches.

132

Z. Ji et al. / Computer Methods and Programs in Biomedicine 151 (2017) 123–138 Table 2 Accuracy of the proposed algorithm and eight other approaches on 18 IBSR studies measured in terms of the mean, standard deviation (STD) and p-value of t-test of DCs. Method

Tissue GM

BAMM BGGMM SCGM FRSCGMM AM-EM CLIC AS-FLGMM MICO Proposed

WM

CSF

Mean

STD

p-value

Mean

STD

p-value

Mean

STD

p-value

0.8014 0.8037 0.8077 0.8340 0.8043 0.7999 0.7849 0.7846 0.8800

0.0455 0.0401 0.0531 0.0373 0.0637 0.0360 0.0395 0.0323 0.0356

4.0e−08 4.9e−11 1.5e−06 1.6e−06 7.7e−05 9.7e−11 3.3e−07 8.7e−11 –

0.8644 0.8617 0.8470 0.7986 0.7740 0.8645 0.8721 0.8134 0.8754

0.0313 0.0300 0.0239 0.0224 0.0541 0.0291 0.0265 0.0411 0.0233

4.6e−03 1.8e−03 1.7e−09 2.2e−08 1.3e−07 2.8e−02 6.1e−02 5.4e−07 –

0.5738 0.5826 0.5917 0.6198 0.6151 0.5726 0.5270 0.6226 0.7342

0.0952 0.0843 0.1051 0.0975 0.1259 0.0951 0.1204 0.0770 0.0363

1.5e−08 1.7e−09 4.9e−07 2.4e−06 4.7e−04 6.2e−09 1.2e−07 5.1e−08 –

Table 3 Mean ± standard deviation of the time cost, number of iterations and time per iteration of the proposed algorithm. Dim.

Dataset

Test Images Num.

Image Size

Time (s)

Iterations

Time per iter. (s)

2D

BrainWeb IBSR BrainWeb IBSR

100 100 20 18

181 × 217 128 × 256 181 × 217 × 181 256 × 128 × 256

1.7780 ± 0.2949 2.2788 ± 1.1227 243.55 ± 39.42 286.70 ± 145.37

10.63 ± 1.43 10.37 ± 5.14 6.40 ± 1.05 6.67 ± 3.38

0.1670 ± 0.0141 0.2201 ± 0.0025 38.07 ± 0.43 43.00 ± 0.07

3D

shows a 3D slice view of the volumetric image “IBSR07”, ground truth and segmentation results produced by different algorithms. In this study, the obvious low contrast between GM and CSF increases the difficulty of segmentation. Hence, most algorithms fail to correctly distinguish both regions. Particularly, the SCGM and the FRSCGMM algorithms resulted in over smoothed segmentations. The proposed algorithm apparently outperforms other approaches and generates the segmentation result most similar to the ground truth. Finally, all segmentation results were quantitatively assessed and the DC values for differentiating each tissue type was depicted in Fig. 10. The overall performance of each algorithm in desalinating each brain tissue type on 18 IBSR studies was measured in terms of the mean, standard deviation and p-value of the DCs, which were displayed in Table 2. Considering 0.05 as the level of significance, this quantitative evaluation apparently shows that the proposed algorithm performs steadily better than other segmentation approaches.

5. Discussions 5.1. Computational complexity Unfortunately, the computational complexity of proposed algorithm is a little high. In [48], the authors pointed out that the computational complexity for the EM algorithm is of the order O(NKD2 ) for each iteration, where N is number of pixels in an image, D is the dimension of each pixel, K is the number of clusters. Consequently, the computational complexity of the proposed algorithm over the whole procedure is O(T N KLD2 MN∂2 ), where T is the numi

ber of iterations when the algorithm converges, L is the number of distributions, M is the number of polynomials and Nδi is the number of pixels in the neighborhood δ i . To evaluate its computational time, the proposed algorithm was tested on 200 2D and 38 3D images selected from the BrainWeb dataset [44] and the internet brain segmentation repository (IBSR) [45]. The mean and standard deviation of the time cost, number of iterations and time per iteration in this experiment (Intel Core i7-4790 CPU, 3.60 GHz, 16GB RAM, 64-bit Windows 8, and Matlab Version 7.8) were listed in Table 3. It is worth mentioning that the

proposed algorithm was performed in the MATLAB environment without any particular code optimization. 5.2. Parameter settings The proposed algorithm mainly involved in four manually setting parameter, i.e. the window size for the weight factor, the number of multivariate distribution, and the two parameters α and β of the MRF priors. The window size is the tradeoff between the anti-noise ability and the prevention from losing much image details, while the number of multivariate distribution determines the model’s ability to fit different shapes of observed data. The two parameters α and β of the MRF priors perform like the inverse temperatures to control the influences of the spatial constraints for between-cluster and within-cluster, respectively. To investigate the impact of parameter settings for the window size of the spatial factor and the number of multivariate distribution, the performances with different parameters are evaluated on 60 2D brain MR images with 9% noise level. Fig. 11 shows the average quantitative evaluation with different parameter settings. From the left figure of Fig. 11, we can find that when the number of Gaussian distribution is 1, the density function of the proposed algorithm degrades to the conventional GMM, which means that the ability of fitting different shapes of observed data is still limited. When the number of Gaussian distributions is larger than 1, the model can better fit the data with higher average DC values. However, when the number of Gaussian distributions is larger than 3, the performances would decrease a little because it is very hard to get more than three components in one cluster, and the number of iterations would correspondingly increase because more parameters need to be updated. From the right figure of Fig. 11, we can find that when the window size is 1 × 1, the density function of the proposed algorithm degrades to the conventional AMM, which means that no spatial information has been introduced. Therefore, the proposed algorithm is still sensitive to the noise when the window size is 1 × 1. When the window size is larger than 1 × 1, it can be analyzed that a larger size would lead to more smooth segmentations and more computational complexity. Finally, we tested the performances of the proposed algorithm with different parameter settings of α and β of the MRF priors. Fig. 12 shows the corresponding results of the average quantitative

Z. Ji et al. / Computer Methods and Programs in Biomedicine 151 (2017) 123–138

133

Fig. 10. DC values of all segmentation results obtained by applying the proposed algorithm and eight other approaches to 18 IBSR studies.

evaluation. From Fig. 12, we can find that both α and β equal to 0, the spatially constrained priors would play no rules in the final segmentation results, which means that the anti-noise ability would be limited. With the increase of the values for both α and β , we can find that the proposed algorithm can product better segmentation results with higher average DC values. When α and β are bigger than 0.2, there is no obvious changes, which means that the proposed algorithm is robust to the setting of α and β . Gen-

erally, we experimentally found that the proposed algorithm can obtain best results when α and β are both located around 0.25. Consequently, considering the execution efficiency and the segmentation accuracy, for all the experiments in this paper, the number of multivariate Gaussian distributions is manually set as 3, the window size of the spatial factor is manually set as 3 × 3, and the parameters involved in MRF priors are manually set as α = β = 0.25.

134

Z. Ji et al. / Computer Methods and Programs in Biomedicine 151 (2017) 123–138

Algorithm 1. Robust generative asymmetric gaussian mixture model. Step 1. Initialize the parameter  using k-means. Initialize the parameter B by setting bi = 1. Initialize the priors  and , i.e., π ik = 1/K and ψ ikl = 1/L, which mean that the uninformative prior for both between- and within- priors are utilized in this paper. Step 2. Calculate the weight factor λij using Eq. (6). Step 3. E-step: (1) Compute between-cluster and within-cluster posterior probabilities zik and yikl using Eq. (5). (2) Compute and normalize s(1) , s(2) , q(1) and q(2) according to Eq. (8), respectively. Step 4. M-step: (1) Update the priors  and with Eq. (9) and Eq. (10), respectively. (2) Update the parameter  with Eqs. (11) and (12). (3) Update the bias field coefficient w with Eqs. (13) and (14). Step 5. If the distance between the newly obtained cluster centers and old ones is less than a user-specified small threshold ε (in this paper, we set ε = 0.0 0 01), stop the iteration; otherwise, go to Step 3.

Fig. 11. Average performances of the proposed algorithm with different parameter settings. Left: Average DC values and average iterations with different number of multivariate Gaussian distributions. Right: Average DC values and average execution time with different size of neighborhood.

capable of producing more accurate segmentation results comparing with several state-of-the-art algorithms. Future work will be devoted to reduce the complexity of the proposed algorithm, for example speeding up our algorithm with the method in [49]. On the other hand, a deep Gaussian Mixture model [50,51] is a powerful generalization of Gaussian Mixture Models to multiple layers, which is constructed by stacking multiple GMM-layers on top of each other. The proposed algorithm can be treated as a special case of the deep GMMs in a certain way. The further work and research on the deep GMMs and the relationship between deem GMMs and the proposed algorithm is out of the scope of this paper, which would be subjected to our future work. Conflict of interest statement

Fig. 12. Average performances of the proposed algorithm with different parameter settings for α and β of the MRF priors.

6. Conclusion This paper proposes a robust generative asymmetric Gaussian mixture model for brain MR image segmentation. This model can simply and efficiently incorporate spatial constraints into an EM framework to simultaneously segment brain MR images and estimate the intensity inhomogeneity. The experimental results on various brain MR images demonstrate that the proposed algorithm is flexible to fit the data shapes, and can simultaneously overcome the influence of noise and intensity inhomogeneity, and hence is

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled “Robust Generative Asymmetric GMM for Brain MR Image Segmentation”. Acknowledgment This work was supported in part by the National Natural Science Foundation of China under Grants 61401209 & 61471297, in part by the Natural Science Foundation of Jiangsu Province, Chinaunder Grant BK20140790, in part by the Natural Science Foundation of Shaanxi Province, China, under Grant 2015JM6287, in part by the Fundamental Research Funds for the Central Universities under Grant 30916011324, and in part by China Postdoctoral Science Foundation under Grants 2014T70525 & 2013M531364.

Z. Ji et al. / Computer Methods and Programs in Biomedicine 151 (2017) 123–138

Appendix 1. Derivation of the proposed algorithm



       (1 )   α D si(1) πi + D s j  πδ j j∈δi j=i

As in Eq. (7), the log-likelihood of the observed data is written as

L(, , , B|X ) =

N 



 log

i=1

K 

πik p(xi |k )

    − β D si(2) ψi + D si(2) ψδi + H si(2) ,  

p(xi |k ) =

L 



1

ψikl



2

It should be noted that the derivations for each updating function is similar with those in the algorithm proposed by Diplaros et al. [24]. We extend the work by Diplaros et al. [24] from the symmetric mixture model to asymmetric mixture model. E-step: By fixing , , and B, we can optimize over si(1 ) for each pixel i. The terms involving si(1 ) in the energy function L(,

, , B|X) are K K K    (1 ) (1 ) (1 ) (1 ) (1 ) sik log sik + sik log πik − sik log sik k=1

k=1

=−

(1 ) (1 ) sik log sik +

k=1

K 

(1 ) sik log



 πik πδi k .

k=1

(1 ) sik ∝ πik πδi k .

By applying the similar derivation for si(2 ) , qi(1 ) and qi(2 ) , we have (2 ) sikl ∝ ψikl ψδi kl ,

qik ∝ (2 ) qikl ∝

M-step: By fixing and we can maximize L(,

, , B|X) over , , and B. The terms involving , , and B

 log

i=1



K 

s(2) ,

q(1)

q(2) ,

 πik p(xi |k )



zδ j k =





λim zmk = λ ji zik +

m∈δ j m= j

λ jm zmk .

m∈δ j m=i, j

Then we bound the term log zδ k using the Jensen’s inequality j

log zδ j k = log





λim zmk ≥ λ ji log zik +

m∈δ j m= j

λ jm log zmk .

m∈δ j m=i, j

Since λji = λij , ignoring the terms independent of zik , the above equation becomes



k=1 j∈δi j=i

k=1

where the distribution qδ(1k) is i

qδ(1k) = i

 j∈δi j=i

λi j q(jk1) .

By applying the similar derivation for yikl , π ik and ψ ikl , we have L  1   (2 ) qikl + qδ(2kl) log yikl , i 4 K    (1 ) sik + sδ(1k) log πik ,

k=1

j∈δi j=i



   (2 )  (2 )  1  (2 )  − D qi  p(i 2) + D q j  pδ j 4 j∈δi j=i

β

i

L    (2 ) sikl + sδ(2kl) log ψikl . i

l=1

Eventually, the energy function L(, , , B|X) involving the between-cluster posterior zik , within-cluster posterior yikl , between-cluster prior π ik and within-cluster prior ψ ikl becomes



    (1 )  (1 )  1  (1 )  (1 )   − D qi pi + D qj pδ j 4

where

k=1

s(1) ,

&

j∈δi k=1 j=i

k=1

α

and qi(2 ) shown as Eq. (8) in our paper.

N 



K K   1 (1 ) (1 ) − − qik log zik − qik log zδ j k , 4

l=1

p(ik1) p(δ1k) , i p(ikl2) p(δ2kl) . i

Therefore, we can get the updating functions for si(1 ) , si(2 ) , qi(1 )

are

which can yield the following equation by ignoring the terms independent of zik :

k=1

The above equation is an negative KL-divergence which become zero when

(1 )

j∈δi j=i



k=1

K 



K K  K   1  (1 ) 1  ( 1 ) qik log zik + λ ji q(jk1) log zik ⇒ qik +qδ(1k) log zik , i 4 4

K K   (1 ) (1 ) (1 ) + sik log πδi k + sik log sik k=1



   (1 )   1  (1 ) − D q i z i + D q j zδi , 4



1

(2π )D/2 |kl |1/2  1 % T × exp − (xi − bi μkl ) kl−1 (xi − bi μkl ) .

k=1

.

First of all, let us consider the derivation over the betweencluster zik , then the terms involving only zik in above equation are



l=1



'

j∈δi j=i

k=1

        −α D si(1) πi + D si(1) πδi + H si(1) where



     (2 )   −β D si(2) ψi + D s j ψ δ j



       1  − D qi(1)  p(i 1) + D qi(1)  p(δ1) + H qi(1) i 4        1  − D qi(2)  p(i 2) + D qi(2)  p(δ2) + H qi(2) i 4

135

log

K 



πik p(xi |k )

k=1

+

K L   1   (1 ) 1   (2 ) qik + qδ(1k) log zik + qikl + qδ(2kl) log yikl i i 4 4 k=1

l=1

K  L      (1 ) (2 ) +α sik + sδ(1k) log πik + β sikl + sδ(2kl) log ψikl . k=1

i

i

l=1

136

Z. Ji et al. / Computer Methods and Programs in Biomedicine 151 (2017) 123–138

This energy function can be divided into two parts. The first part consists of the terms involving the between-cluster posterior zik and between-cluster prior π ik :





K  1 log πik p˜ (xi |k ) 2

+

k=1

K  1   (1 ) qik + qδ(1k) log zik i 4

+

l=1

L    (2 ) +β sikl + sδ(2kl) log ψikl i

l=1

k=1

=

K    (1 ) +α sik + sδ(1k) log πik .

l=1

K

(1 ) k=1 (qik + qδi k ) = 1/2, we can get the following

Since (1/4 )

(1 )







=

K  1 log πik p(xi |k ) 2 k=1

i

L  1   (2 ) qikl + qδ(2kl) log p˜ (xi |kl ) i 4 l=1





+

K K   1   (1 ) qik + qδ(1k) log πik p(xi |k ) i 4 k=1

L    (2 ) sikl + sδ(2kl) log ψikl l=1

equation by expanding the posterior zik

+

L  1   (2 ) qikl + qδ(2kl) log ψikl p˜ (xi |kl ) i 4

i

k=1



L  1   (2 ) qikl + qδ(2kl) log ψikl p˜ (xi |kl ) i 4

L 

(1 4

l=1

%

 (2 ) (2 )   (2 ) qikl + qδ(2kl) + β sikl + sδ kl log ψikl . i

i

Similarly, we can easily get the updating function for the within-cluster prior ψ ikl by using Lagrange’s multiplier to enforce L the constraint k=1 ψikl = 1 for the prior distribution of withincluster ψ ikl :

k=1

K  1   (1 ) qik + qδ(1k) log (πik p(xi |k ) ) i 4 k=1

1 1 + 4β

1



 (2 ) (2 )   (2 ) qikl + qδ(2kl) + β sikl + sδ kl .

K    (1 ) +α sik + sδ(1k) log πik

ψikl =

K  1   (1 ) = qik + qδ(1k) log (πik p˜ (xi |k ) ) i 4

Therefore, we can get the updating function for the withincluster prior ψ ikl shown as Eq. (10) in our paper. Consequently, the energy function L(, , , B|X) can be rewritten as

i

k=1

k=1

K    (1 ) +α sik + sδ(1k) log πik

=

+

k=1

K (    (1) (1) %  1 (1 ) + qik + qδ(1k) + α sik + sδ k log πik . i i 4 k=1

As the prior distribution of between-cluster π ik satisfies the K constraints k=1 πik = 1, we can easily get the updating function for the between-cluster prior π ik by using the Lagrange’s multiplier:





 (1 ) (1 )   1 1  (1 ) q + qδ(1k) + α sik + sδ k . i i 1 + 4α 4 ik

Therefore, we can get the updating function for the betweencluster prior π ik shown as Eq. (9) in our paper. The second part consists of the terms involving the withincluster posterior yikl and within-cluster prior ψ ikl



1 log 2

K 

πik

L 

k=1



l=1

l=1

Since (1/4 )

l=1

(2 ) (qikl + qδ(2kl) ) = 1/2, we can get the following i

equation by expanding the posterior yikl :





K L   1 log πik ψikl p˜ (xi |kl ) 2 k=1



1 4

L 



l=1



(2 ) qikl + qδ(2kl) log

L 

i

l=1

l=1

k=1

+

L  1   (2 ) qikl + qδ(2kl) log p˜ (xi |kl ) i 4 l=1

L (    (2) (2) %  1 (2 ) + qikl + qδ(2kl) + β sikl + sδ kl log ψikl . i i 4 l=1

Differentiating the above equation over , we can get the following updating functions for the means and covariances of the K × L Gaussian components:



N

i=1

(1 ) qik + qδ(1k)

μkl = N  i=1

N

i=1



i

(1 ) qik + qδ(1k)

ψikl p˜ (xi |kl )



(2 ) qikl + qδ(2kl) xi bi



i



(1 ) qik + qδ(1k) i N



i

(2 ) qikl + qδ(2kl) b2i

,

i



i=1

L L     1   (2 ) (2 ) + qikl + qδ(2kl) log yikl + β sikl + sδ(2kl) log ψikl . i i 4

L

K (    (1) (1) %  1 (1 ) qik + qδ(1k) + α sik + sδ k log πik i i 4

kl =

ψikl p˜ (xi |kl )

l=1

i

k=1

K  1   (1 ) qik + qδ(1k) log p˜ (xi |k ) i 4

πik =

i

K  1   (1 ) qik + qδ(1k) log p˜ (xi |k ) i 4

i

k=1

4





(2 ) qikl + qδ(2kl) (xi − bi μkl )(xi − bi μkl ) i

 (1 )

(1 ) qik + qδ k i

T

 (2 )

(2 ) qikl + qδ kl

.

i

Therefore, we can get the updating function for parameters  shown as Eqs. (11) and (12) in our paper. To estimate and correct the bias field, the terms involving bi in the energy function L(, , , B|X) are N 



log

K 



πik

L 



ψkl

1

1

(2π )D/2 |kl |1/2 i=1 k=1 l=1  1 % T exp − (xi − bi μkl ) kl−1 (xi − bi μkl ) . 2

Application of the complete data condition in [20], maximizing the log-likelihood function in the above equation will also lead to

Z. Ji et al. / Computer Methods and Programs in Biomedicine 151 (2017) 123–138

an increase in the value of the objective function

J=

N  K  i=1 k=1

)

zik log πik +

L  l=1



yikl log ψikl −

D log (2π ) 2

%

1 1 T − log (|kl | ) − (xi − bi μkl ) kl−1 (xi − bi μkl ) 2 2

.

Then, we should determine the combination coefficients w = {ωk ,ω2 ,, ωM } by solving ∂ J/∂ w = 0, and yield the similar updating functions for bias field coefficients in [36,37]. Therefore, we can get the updating functions for w shown as Eq. (13) and (14) in our paper. Appendix 2. The acronym used in the paper MR: magnetic resonance GM: gray matter WM: white matter CSF: cerebrospinal fluid GMM: Gaussian mixture models EM: expectation maximization MRF: Markov random fields HMRF: hidden Markov random fields AMM: asymmetric mixture models KL: Kullback–Leibler SCGM: spatially constrained generative model FRSCGMM: fast and robust spatially constrained Gaussian mixture model BAMM: Bayesian bounded asymmetric mixture model BGGMM: bounded generalized Gaussian mixture model AM-EM: automated model based on expectation maximization algorithm CLIC: coherent local intensity clustering AS-FLGMM: adaptive scale fuzzy local Gaussian mixture model MICO: multiplicative intrinsic component optimization References [1] B. Gu, X. Sun, V.S. Sheng, Structural minimax probability machine, IEEE Trans. Neural Netw. Learn. Syst. (2016), doi:10.1109/TNNLS.2016.2544779. [2] B. Gu, V.S. Sheng, A robust regularization path algorithm for v-support vector classification, IEEE Trans. Neural Netw. Learn. Syst. (2016), doi:10.1109/TNNLS. 2016.2527796. [3] B. Gu, V.S. Sheng, Z.J. Wang, D. Ho, S. Osman, S. Li, Incremental learning for ν -support vector regression, Neural Netw. 67 (2015) 140–150. [4] B. Gu, V.S. Sheng, K.Y. Tay, W. Romano, S. Li, Incremental support vector learning for ordinal regression, IEEE Trans. Neural Netw. Learn. Syst. 26 (2015) 1403–1416. [5] U. Vovk, F. Pernus, B. Likar, A review of methods for correction of intensity inhomogeneity in MRI, IEEE Trans. Med. Imag. 26 (2007) 405–421. [6] M.N. Ahmed, S.M. Yamany, N. Mohamed, A.A. Farag, T. Moriarty, A modified fuzzy c-means algorithm for bias field estimation and segmentation of MRI data, IEEE Trans. Med. Imag. 21 (2002) 193–199. [7] M.E. Algorri, F. Flores-Mangas, Classification of anatomical structures in MR brain images using fuzzy parameters, IEEE Trans. Biomed. Eng. 51 (2004) 1599–1608. [8] Z. Wu, K.D. Paulsen, J.M. Sullivan, Adaptive model initialization and deformation for automatic segmentation of T1-weighted brain MRI data, IEEE Trans. Biomed. Eng. 52 (2005) 1128–1131. [9] H.H. Chang, D.J. Valentino, G.R. Duckwiler, A.W. Toga, Segmentation of brain MR images using a charged fluid model, IEEE Trans. Biomed. Eng. 54 (2007) 1798–1813. [10] S.K. Michopoulou, L. Costaridou, E. Panagiotopoulos, R. Speller, G. Panayiotakis, A. Todd-Pokropek, Atlas-based segmentation of degenerated lumbar intervertebral discs from MR images of the spine, IEEE Trans. Biomed. Eng. 56 (2009) 2225–2231. [11] Z. Ji, J. Liu, G. Cao, Q Sun, Q. Chen, Robust spatially constrained fuzzy c-means algorithm for brain MR image segmentation, Pattern Recognit. 47 (2014) 2454–2466. [12] L. Gupta, T. Sortrakul, A Gaussian mixture based image segmentation algorithm, Pattern Recognit. 31 (1998) 315–325. [13] W.M. Wells, W.E.L. Grimson, R. Kikinis, F.A. Jolesz, Adaptive segmentation of MRI data, IEEE Trans. Med. Imag. 15 (1996) 429–442. [14] K. Van Leemput, F. Maes, D. Vandermeulen, P. Suetens, Automated model-based bias field correction of MR images of the brain, IEEE Trans. Med. Imag. 18 (1999) 885–896.

137

[15] H. Zhu, X. Pan, Robust fuzzy clustering using nonsymmetric student’s t finite mixture model for MR image segmentation, Neurocomputing 175 (2016) 500–514. [16] A. Galimzianovaa, F. Pernuša, B. Likara, Ž. Špiclin, Stratified mixture modeling for segmentation of white-matter lesions in brain MR images, NeuroImage 124 (2016) 1031–1043. [17] B. Ozenne, F. Subtil, L. Østergaard, D. Maucort-Boulch, Spatially regularized mixture model for lesion segmentation with application to stroke patients, Biostatistics 16 (2016) 580–595. [18] B.H. Menze K. Van Leemput, D. Lashkari, T. Riklin-Raviv, E. Geremia, E. Alberts, P. Gruber, S. Wegener, M.-A. Weber, G. Székely, N. Ayache, P. Golland, A generative probabilistic model and discriminative extensions for brain lesion segmentation-with application to tumor and stroke, IEEE Trans. Med. Imag. 35 (2016) 933–946. [19] A. Banerjee, P. Maji, Rough sets and stomped normal distribution for simultaneous segmentation and bias field correction in brain MR images, IEEE Trans. Image Process. 24 (2015) 5764–5776. [20] G. McLachlan, D. Peel, Finite Mixture Models, Wiley, New York, 20 0 0. [21] C.M. Bishop, Pattern Recognition and Machine Learning, Springer, New York, 2006. [22] T.M. Nguyen, Q.M.J. Wu, A nonsymmetric mixture model for unsupervised image segmentation, IEEE Trans. Cybernet. 43 (2013) 751–765. [23] G.J. McLachlan, T. Krishnan, The EM Algorithm and Extensions, Wiley Series in Probability and Statistics, Wiley, New York, 1997. [24] S. Sanjay-Gopal, T.J. Hebert, Bayesian pixel classification using spatially variant finite mixtures and the generalized EM algorithm, IEEE Trans. Image Process. 7 (1998) 1014–1028. [25] K. Blekas, A. Likas, N.P. Galatsanos, I.E. Lagaris, A spatially constrained mixture model for image segmentation, IEEE Trans. Neural Netw. 16 (2005) 494–498. [26] H. Greenspan, A. Ruf, J. Goldberger, Constrained Gaussian mixture model framework for automatic segmentation of MR brain images, IEEE Trans. Med. Imag. 25 (2006) 1233–1245. [27] A. Diplaros, N. Vlassis, T. Gevers, A spatially constrained generative model and an EM algorithm for image segmentation, IEEE Trans. Neural Netw. 18 (2007) 798–808. [28] C. Nikou, N.P. Galatsanos, A.C. Likas, A class-adaptive spatially variant mixture model for image segmentation, IEEE Trans. Image Process. 16 (2007) 1121–1130. [29] T.M. Nguyen, Q.M.J. Wu, Fast and robust spatially constrained gaussian mixture model for image segmentation, IEEE Trans. Circuits Syst. Video Technol. 23 (2013) 621–635. [30] H. Zhang, Q.M.J. Wu, T.M. Nguyen, Image segmentation by a robust modified Gaussian mixture model, in: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 1478–1482. [31] R.P. Browne, P.D. McNicholas, M.D. Sparling, Model-based learning using a mixture of mixtures of gaussian and uniform distributions, IEEE Trans. Pattern Anal. Mach. Intell. 34 (2012) 814–817. [32] T.M. Nguyen, Q.M.J. Wu, D. Mukherjee, H. Zhang, A Bayesian bounded asymmetric mixture model with segmentation application, IEEE J. Biomed. Health Inform. 18 (2014) 109–119. [33] T.M. Nguyen, Q.M.J. Wu, Bounded asymmetrical Student’s-t mixture model, IEEE Trans. Cybernet. 44 (2014) 857–869. [34] S. Kullback, R.A. Leibler, On information and sufficiency, Ann. Math. Statist. 22 (1951) 79–86. [35] Z. Ji, Y. Huang, Q. Sun, G. Cao, A spatially constrained generative asymmetric Gaussian mixture model for image segmentation, J. Vis. Commun.Image Represent. 40 (2016) 611–626. [36] C. Li, C. Gatenby, Li Wang, J.C. Gore, A robust parametric method for bias field estimation and segmentation of MR images, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Miami, FL, 2009, pp. 218–223. [37] C. Li, J.C. Gore, C. Davatzikos, Multiplicative intrinsic component optimization (MICO) for MRI bias field estimation and tissue segmentation, Magn. Reson. Imag. 32 (2014) 913–923. [38] G. Celeux, F. Forbes, N. Peyrard, EM procedures using mean field-like approximations for Markov model-based image segmentation, Pattern Recognit. 36 (2003) 131–144. [39] S.T. Roweis, L.K. Saul, G.E. Hinton, Global coordination of local linear models, in: Advances in Neural Information Processing Systems, MIT Press, Cambridge, MA, 2002, pp. 889–896. [40] J.J. Verbeek, N. Vlassis, B. Kröse, Self-organizing mixture models, Neurocomputing 63 (2005) 99–123. [41] H. Zhang, Q.M.J. Wu, T.M. Nguyen, A robust fuzzy algorithm based on Student’s t-distribution and mean template for image segmentation application, IEEE Sig. Process. Lett. 20 (2013) 117–120. [42] M. Gong, Y. Liang, J. Shi, W. Ma, J. Ma, Fuzzy C-means clustering with local information and kernel metric for image segmentation, IEEE Trans. Image Process. 22 (2013) 573–584. [43] Z. Ji, G. Cao, Q. Sun, A fuzzy clustering algorithm with robust spatially constraint for brain MR image segmentation, in: IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Beijing, 2014, pp. 202–209. [44] R.K.S. Kwan, A.C. Evans, G.B. Pike, MRI simulation-based evaluation of image-processing and classification methods, IEEE Trans. Med. Imag. 18 (1999) 1085–1097. [45] T. Rohlfing, Image similarity and tissue overlaps as surrogates for image registration accuracy: widely used but unreliable, IEEE Trans. Med. Imag. 31 (2012) 153–163.

138

Z. Ji et al. / Computer Methods and Programs in Biomedicine 151 (2017) 123–138

[46] T.M. Nguyen, Q.M.J. Wu, H. Zhang, Bounded generalized Gaussian mixture model, Pattern Recognit. 47 (2014) 3132–3142. [47] Z. Ji, Y. Xia, Q. Sun, Q. Chen, D. Feng, Adaptive scale fuzzy local Gaussian mixture model for brain MR image segmentation, Neurocomputing 134 (2014) 60–69. [48] R.A. Redner, H.F. Walker, Mixture densities, maximum likelihood and the EM algorithm, SIAM Rev. 26 (1984) 195–239. [49] S.K. Ng, G.J. McLachlan, Speeding up the EM algorithm for mixture model-based segmentation of magnetic resonance images, Pattern Recognit. 37 (2004) 1573–1589.

[50] A. Van den Oord, B. Schrauwen, Factoring variations in natural images with deep Gaussian mixture models, Adv. Neural Inf. Process. Syst. (2014) 3518–3526. [51] A. van den Oord, J. Dambre, Locally-connected transformations for deep GMMs, in: International Conference on Machine Learning (ICML): Deep learning Workshop, 2015, pp. 1–8.