Automatic segmentation of white matter lesions on magnetic resonance images of the brain by using an outlier detection strategy

Automatic segmentation of white matter lesions on magnetic resonance images of the brain by using an outlier detection strategy

Magnetic Resonance Imaging 32 (2014) 1321–1329 Contents lists available at ScienceDirect Magnetic Resonance Imaging journal homepage: www.mrijournal...

946KB Sizes 0 Downloads 66 Views

Magnetic Resonance Imaging 32 (2014) 1321–1329

Contents lists available at ScienceDirect

Magnetic Resonance Imaging journal homepage: www.mrijournal.com

Automatic segmentation of white matter lesions on magnetic resonance images of the brain by using an outlier detection strategy Rui Wang a, Chao Li a, Jie Wang a, Xiaoer Wei b, Yuehua Li b, Chun Hui a, Yuemin Zhu c, Su Zhang a,⁎ a b c

School of Biomedical Engineering and Med-X Research Institute, Shanghai Jiao Tong University, Shanghai, China Institute of Diagnostic and Interventional Radiology, Sixth Affiliated People's Hospital, Shanghai Jiao Tong University, Shanghai, China CREATICS; CNRS UMR 5220; Inserm 1044; INSA Lyon; Villeurbanne, France

a r t i c l e

i n f o

Article history: Received 8 April 2014 Revised 23 June 2014 Accepted 8 August 2014 Keywords: White matter lesions Gaussian mixture model Extreme value theory Outlier detection

a b s t r a c t White matter lesions (WMLs) are commonly observed on the magnetic resonance (MR) images of normal elderly in association with vascular risk factors, such as hypertension or stroke. An accurate WML detection provides significant information for disease tracking, therapy evaluation, and normal aging research. In this article, we present an unsupervised WML segmentation method that uses Gaussian mixture model to describe the intensity distribution of the normal brain tissues and detects the WMLs as outliers to the normal brain tissue model based on extreme value theory. The detection of WMLs is performed by comparing the probability distribution function of a one-sided normal distribution and a Gumbel distribution, which is a specific extreme value distribution. The performance of the automatic segmentation is validated on synthetic and clinical MR images with regard to different imaging sequences and lesion loads. Results indicate that the segmentation method has a favorable accuracy competitive with other state-of-the-art WML segmentation methods. © 2014 Elsevier Inc. All rights reserved.

1. Introduction White matter lesions (WMLs) are presented as focal or diffused high signals on T2 weighted (T2-w) and fluid attenuated inversion recovery (FLAIR) images within the periventricular white matter (WM) and subcortical regions of the brain [1]. The pathogenesis of WMLs remains unknown but is suggested to be associated with age, ischemia, blood–brain barrier abnormalities, disturbances in cerebrospinal fluid (CSF) circulation, and cerebral venous circulation impairment [2]. The occurrence of WMLs is often accompanied with cognition impairment [3], dementia [4], gait disorders [5], and depression [6]. Analysis of WMLs on magnetic resonance (MR) images has gained increasing attention from researchers worldwide. The automatic segmentation and volumetric quantification of WMLs provide significant information on the number, size, and location of WMLs on MR images. These data are important for evaluating the WML burden, tracking the progression of the disease, and monitoring the efficacy of new drug therapy trials [7,8]. For example, stroke patients with a high WML load have increased risk of death or dependency, recurrent stroke, intra-cerebral hemorrhage during anticoagulation therapy, and post-stroke dementia [9,10]. ⁎ Corresponding author at: Room 123, Med-X Institute, 3 Teaching Building, Shanghai Jiao Tong University, No. 1954, Huashan Road, Shanghai, China, 200030. Tel.: +86 21 62933209; fax: +86 21 62932156. E-mail address: [email protected] (S. Zhang). http://dx.doi.org/10.1016/j.mri.2014.08.010 0730-725X/© 2014 Elsevier Inc. All rights reserved.

Thus, the accurate segmentation and quantitative analysis of focal lesions are required, and the methods may vary from manual to automatic segmentation. However, the manual delineation of WMLs, namely tracing the lesions in each slice by experienced radiologists, is time-consuming and subject to intra- and inter-rater variability [7]. By comparison, the automatic segmentation reduces the variability inherent in the manual segmentation and improves the efficiency and reproducibility of WML segmentation. Supervised and unsupervised learning methods are utilized in the automatic segmentation of WMLs. Supervised segmentation requires a training database of previously segmented images and uses different supervised learning methods (e.g., k-nearest neighbors (k-NN), artificial neural networks (ANNs), or support vector machine (SVM)) [11–14] to segment WMLs. The advantage of supervised methods is their efficiency in many segmentation tasks given that the classifiers are available. However, an accurate classifier relies on a good training database, which should cover all possible cases and may be difficult to obtain. Unsupervised methods [15–19] do not require a training database, and segmentation is performed by using clustering methods in combination with anatomical knowledge (e.g., probability of WMLs and anatomical localization). A common strategy to perform WML segmentation is the detection of WMLs as outliers to the normalappearing brain tissue model. The Gaussian mixture model (GMM) is often applied to demonstrate the distribution of image intensities in healthy subjects and various outlier detection methods are employed to

1322

R. Wang et al. / Magnetic Resonance Imaging 32 (2014) 1321–1329

segment WMLs. Leeput et al. [18] proposed a weighted expectationmaximization (EM) algorithm, in which the voxels not well explained by GMM are assigned with reduced weights and detected as possible lesions. Recently, a trimmed-likelihood estimator [20] has been used in some methods [19,21] to model the lesions as outliers. The candidate lesions have been detected using a Mahalanobis distance with some heuristic rules. Moreover, some authors modeled the lesions as a separated class instead of outliers. Freifield [22] proposed a constrained GMM to segment the brain into four classes (i.e., WM, gray matter (GM), CSF, and lesions), and the lesion boundaries are delineated using a curve evolution technique. In this paper, we present a fully automatic segmentation framework for detecting and quantifying WMLs on MR images from different sources. In the proposed method, the normal tissue classes in the brain are demonstrated by GMM, and WMLs are detected using an outlier detection technique, i.e., extreme value theory (EVT) [23–25]. The performance of the automatic segmentation method, hereon termed as GMM–EVT, is validated further on synthetic and clinical MR images with regard to different imaging sequences and lesion loads. 2. Materials and methods 2.1. Data In this research, synthetic and clinical MR images were used to evaluate the performance of GMM–EVT segmentation. 2.1.1. Synthetic images Synthetic images are created by computers and are advantageous for researchers to validate their segmentation algorithms. The parameters in synthetic images, such as MR protocols, slice thicknesses, and number of lesions, can be controlled. Moreover, the ground truth (GT) of lesions on synthetic images is available, which can simplify the process of validation. The synthetic data set was used in this research to evaluate the performance of GMM–EVT segmentation and consisted of the T1 weighted (T1-w), T2-w, and proton density weighted (PD-w) images of simulated multiple sclerosis (MS) brain from BrainWeb (http://brainweb.bic.mni.mcgill. ca/brainweb/) [26,27], with 1 mm 3 spatial resolution, different levels of noise (1%, 3%, 5%, 7%, and 9%), 20% inhomogeneity, and moderate lesion load. These simulated images were produced based on an MR imaging (MRI) simulator and an anatomical model of a brain with MS lesions. The GT of the MS phantoms was downloaded from BrainWeb for the validation of GMM–EVT segmentation. 2.1.2. Clinical images The MR images of 86 patients diagnosed with WM abnormalities were acquired using a Philips 3.0 T MR scanner. Patients were between the age of 58 and 86 years (mean = 70.1, SD = 7.7). All patients were scanned using a FLAIR sequence (slice thickness = 5 mm; repetition time (TR)/echo time (TE)/inversion time (TI) =11,000/120/2800 ms; flip angel = 90°; field of view (FOV) = 280 mm; matrix = 640 × 640; voxel size = 5 × 0.4375 × 0.4375 mm3), a T1-w fast-field echo sequence (slice thickness = 5 mm; TR/TE =250/2.3 ms; flip angel = 70°; FOV = 280 mm; matrix = 640 × 640; voxel size = 5 × 0.4375 × 0.4375mm3), and a T2-w turbo-spin echo sequence (slice thickness = 5 mm; TR/TE =3508.4/80 ms; flip angel = 90°; FOV = 280 mm; matrix =640 × 640; voxel size = 5 × 0.4375 × 0.4375 mm 3). The imaging protocols were approved by the institutional review board. Written informed consent was obtained. Manual segmentation was performed on the FLAIR images by a neurologist and a radiologist referring to the corresponding T1-w and T2-w images. The images of binary lesion maps were generated and considered as the GT, which would be used for the validation of

GMM–EVT segmentation. According to the manually segmented WML volumes, the patients were classified into three groups with different lesion loads [28]: mild (b 10 cc; 45 patients), moderate (10–30 cc; 25 patients), and severe (N30 cc; 16 patients). Thus, the different performances of automatic segmentation can be assessed with regard to different lesion loads. 2.2. Framework of GMM-EVT segmentation 2.2.1. Image preprocessing Image preprocessing steps, including intensity inhomogeneity (IIH) correction and brain extraction, were performed on the MR images before automatic segmentation. IIH correction aims to correct inaccurate patient positioning and to eliminate radio frequency inhomogeneity. The N3 inhomogeneity correction [29] module in MIPAV software (http://mipav.cit.nih.gov/) was used to implement IIH correction. Brain extraction was performed using a brain extraction tool (BET) [30] in MRIcro software because nonbrain tissues (e.g., skull and scalp) may cause misclassification of some brain structures with similar intensities as the lesions. For WML segmentation using multimodal MR images, the demons registration algorithm [31] was used prior to brain extraction to register the T1-w, T2-w, and FLAIR (or PD-w) images of the same patient into the same space. 2.2.2. Gaussian mixture model In medical image segmentation, single Gaussian model (SGM) [32] is commonly used to approximate the distribution of a tissue class in the brain. The probability density function (PDF) of SGM is defined by Nðxi ; μ; ΣÞ ¼

  1 T −1 x exp − ð −μ Þ Σ ð x −μ Þ i i d 2 ð2πÞ2 jΣj 1

1 2

ð1Þ

where X = {x1, x2, …, xn} denotes a set of n voxels. The intensity of voxel i is denoted by a column vector xi = {xi1, …, xid}, which consists of d intensity values corresponding to d different MR sequences. The mean vector and covariance matrix within the class are described using μ and Σ. The use of SGM to demonstrate the intensity of a tissue often oversimplifies the problem. Therefore, GMM [32] is proposed to solve the limitation. The GMM distribution is expressed as a linear superposition of Gaussians in the form of pðxÞ ¼

K X

π k N ðxjμ k ; Σk Þ

ð2Þ

k¼1

where K is the number of mixture components, and N(x|μk, Σk) denotes the PDF of the k th Gaussian component with unknown parameters, including mean vector μk, covariance matrix Σk, and mixture proportion πk. For convenience, we use θ = {μk, Σk, πk} to represent the unknown parameters. The brain image segmentation was obtained by estimating the unknown parameter set θ using the maximum likelihood estimators via EM algorithm: 

θ ¼ arg max logLðθÞ

ð3Þ

θ

The maximum likelihood function is given by LðθÞ ¼

N X i¼1

log

( K X

) πk Nðxi ; μ k ; Σk Þ

ð4Þ

k¼1

We denote a set of latent variables by Z = {z1, z2, …, zN}, in which each discrete label zi ∈ {WM, GM, CSF} is assigned to the corresponding voxel xi in X. The EM algorithm estimates the optimal

R. Wang et al. / Magnetic Resonance Imaging 32 (2014) 1321–1329

parameter set θ⁎ by iteratively estimating the latent variable zi using an expectation (E) and maximization (M) step. In the E step, the poster probability of the voxel xi drawn from each Gaussian is evaluated using the current unknown parameter set θ. In the M step, the unknown parameters, including mean vectors, covariance matrixes, and mixture proportions, are re-estimated. The required steps for EM are illustrated as follows: 1. Initialize the unknown parameter set θ (0) by using k-means clustering method. The voxels in the image are divided into K classes. Moreover, μk(0) and Σk(0) are obtained by computing the mean vector and covariance matrix of the k th class, respectively. Similarly, the mixture proportion πk(0) is computed by dividing the number of voxels in the k th class to that in the whole image. 2. E step. Evaluate the poster probability of each voxel xi by using π Nðxn jμ k ; Σk Þ pik ¼ K k   X π j N xn jμ j ; Σ j

ð5Þ

j¼1

3. M step. Update the unknown parameters in GMM by using N X

μ

ðt Þ

¼

pik xn

n¼1 N X

ð6Þ pik

n¼1

ðt Þ

Σk ¼

N    1 X ðt Þ ðt Þ T xn −μ k pik xn −μ k N X n¼1 pik

ð7Þ

n¼1 N X ðt Þ

πk ¼

n¼1

N

pik ð8Þ

where t denotes the iteration number. 4. Return to step 2 until the convergence criterion of log L(θ) is satisfied. 2.2.3. EVT The distribution of the image intensities in the WML subjects cannot be well modeled by GMM if adjustment is not made. Thus, the normal brain tissues (e.g., WM, GM, and CSF) and WMLs are demonstrated using two different statistical models (GMM and EVT, respectively). EVT [33,34] is used to describe the behavior of maxima

1323

or minima for a sequence of independent and identically distributed (i.i.d.) observations {x1, x2, …, xm}. EVT is considered an efficient method to detect outliers occurring at the tails of the data distribution. Fig. 1 shows that WMLs correspond to the right-hand tail of the FLAIR histogram. Thus, EVT can be used to model the probability distribution of WMLs and to detect voxels in WMLs as outliers in the normal tissue regions. For convenience, we discuss the property of the right-hand tail of the Gaussian distribution in accordance with the spatial layout of WMLs in the gray histogram of the MR images. According to Fisher–Tippet theorem [33], for an i.i.d. data set m of observations X = {x1, x2, …, xm} with the maximum xT = max{X}, the observation of xi over a given threshold xT can be well approximated by a generalized extreme value (GEV) distribution, given by (  )   xi −μ m −1=ξ P ðxi ; μ m ; σ m ; ξÞ ¼ exp − 1 þ ξ σm

ð9Þ

where ξ is a shape parameter; and μm and σm are norming parameters. The cases of ξ → 0, ξ N 0, and ξ b 0 provide the Gumbel (Type I), Frechet (Type II), and Weibull (Typer III) distribution, respectively. Considering that WMLs are located at the tail of a particular side on the histogram (Fig. 1), the GEV distribution used in the WML segmentation should be concerned with the tail of the onesided normal distribution D = |N(0, 1)|. Therefore, the Gumbel distribution is used to model the behavior of WMLs, and the other two GEV distributions (Frechet and Weibull) will not be considered further. The Gumbel distribution is defined as    1 P EVT ðxi jμ m ; σ m Þ ¼ exp − exp − ðxi −μ m Þ σm

ð10Þ

where xi ≥ xT and PEVT(xi|μm, σm) represent the cumulative distribution function. The PDF of the Gumbel distribution can be written in the following form: pEVT ðxi jμ m ; σm Þ     1 1 1 exp − ðxi −μ m Þ exp − ðxi −μ m Þ ¼ σm σm σm

ð11Þ

where xi ≥ xT. 2.2.4. Outlier detection for SGM A heuristic threshold, which indicates the beginning of the tail, should be defined to classify the abnormal data when EVT is used to detect outlier following normal distribution. Thus, an efficient

Fig. 1. (A) An axial slice of a FLAIR image. (B) Pseudo-color image of the same slice. (C) Color histogram of the pseudo-color image for intuitive understanding of the corresponding location for a specific tissue on the pseudo-color image and the color histogram.

1324

R. Wang et al. / Magnetic Resonance Imaging 32 (2014) 1321–1329

method was designed to fix the threshold xT on the basis that EVT deals with the extreme deviations from the median of probability distribution. The value of X = {x1, x2, …, xn} is a set of (possibly multivariate) i.i.d. data drawn from SGM. The norming parameters μm and σm are computed by finding the mean vector μSGM and covariance matrix ΣSGM of X. WML segmentation focuses on the right-hand side of the normal distribution D = |N(0, 1)|. Thus, we mainly study the probability density of data satisfying xj ≥ μSGM (j = 1, 2, …, l; l b n). To find the heuristic threshold xT that differentiates the normal observations and the outliers, we initially computed the Mahalanobis distance between xj and μSGM by ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   r T   x j −μ SGM Σ−1 M xj ¼ SGM x j −μ SGM

ð12Þ

The probability density of xj with regard to the SGM (equivalent to the one-sided normal distribution D = |N(0, 1)|) can be estimated by   pSGM x j ¼

  1  2 M x exp − j d 2 ð2πÞ2 jΣSGM j 2

1 2

ð13Þ

The probability density of xj under the Gumbel distribution is provided by   pEVT x j ¼

1 jΣSGM j

1 2

   2  2  exp −M x j − exp −M x j

ð14Þ

Finally, all observations xj satisfying pEVT(xj) ≥ pSGM(xj) are detected and constitute the data set of outliers, namely, X EVT ¼ n o xE j ; j ¼ 1; 2; …; k , where k represents the total number of outliers. The heuristic threshold is given by xT = min{XEVT}. All data that exceed the heuristic threshold xT are considered the possible outliers of interest. 2.2.5. WML segmentation Single modal and multimodal MR images are used to perform WML segmentation. Segmentation using single modal MR images (e.g., FLAIR) does not require registration; thus, the process is fast and efficient. However, segmentation using multimodal MR images (e.g., T1-w, T2-w, and FLAIR) combines information from different MRI sequences, thereby reducing the uncertainty of the lesion borders. WML segmentation using different data sources is separately discussed as follows: I. Segmentation using FLAIR images FLAIR images are highly sensitive in detecting WMLs. The WMLs appear hyperintense against the surrounding normal tissues, such as WM, GM, and CSF. The gray–white matter contrast is reduced on FLAIR, thereby producing a homogeneous low background signal and causing WMLs to stand out clearly. On the basis of the imaging findings described in the previous section, we use a two-component (K = 2) GMM to demonstrate the intensity distribution of the CSF region and the low background signal region (WM + GM). EVT is then used to detect the bright WMLs on the FLAIR images. Considering that WML segmentation using FLAIR images involves the information of only one sequence, the column vector xi is therefore 1D, i.e., d = 1. II. Segmentation using multimodal MR images WML segmentation using multimodal MR images often classifies each voxel of the brain into four classes: WM, GM, CSF, and WMLs. Thus, a three-component (K = 3) GMM is used to demonstrate the intensity distribution of WM, GM, and CSF. WMLs are detected by EVT. Considering that WML segmentation using multimodal MR images involves the information from three different MR sequences, the column vector xi is therefore 3D (d = 3).

In general, the WML segmentation framework of our GMM–EVT algorithm consists of four main steps: Step 1 Initialize the parameters K and d according to the data sources for performing WML segmentation. Step 2 Use EM algorithm to classify each voxel of the brain into K different classes. The corresponding mean vectors μk, covariance matrix Σk (k = 1, …, K), and log-likelihood function log L0(θ) are computed. Step 3 Compute the Mahalanobis distances of the voxel i from the centers of K different Gaussian components using Eq. (12) and they were given by M1 (xi),…, MK (xi). Locate the component closest to voxel i, and use k⁎ to denote the corresponding class label. We can substitute μ k and Σk into Eqs. (13) and (14) to compute the probability densities of pSGM and pEVT, respectively. If pEVT ≥ pSGM, the voxel i is considered a candidate WML. The voxel i will be stored in the data set XEVT and then removed from the original data set X. We denote the updated data set X as XNormal and use XNormal to compute the new log-likelihood function log L1(θ). Step 4 Use the log-likelihood function obtained in the current iteration and last iteration to compute the absolute difference ε = | log L1 (θ) − log L0(θ)|. The iteration process of detecting candidate WML using XNormal is repeated until a suitable criterion is satisfied, for instance, the absolute difference ε is lower than some threshold (ε b 0.01). Thus, the voxels in the data set XEVT comprise the final WMLs, and those in the data set XNormal constitute the normal brain tissues. 2.2.6. False-positive minimization The WMLs detected by GMM–EVT segmentation is mixed with some false-positive (FP) signals, including bony artifacts and flow artifacts. The flow artifacts are generated by pulsatile CSF flow, which are located around the interface of CSF and cortical GM. Most of the FP signals caused by these artifacts are as low as one or two voxels [35]. Thus, the morphological operations, such as dilation and erosion, are performed on the binary lesion map to eliminate the FP signals. 2.3. Evaluation To evaluate the accuracy and consistency of our algorithm, we compared the results of the automatic segmentation with GT by using three different overlap metrics: similarity index (SI) [36], overlap fraction (OF), and extra fraction (EF) [12,37]. These evaluation metrics are defined by SI ¼

2  ðGT∩ASÞ GT þ AS

ð15Þ

OF ¼

GT∩AS GT

ð16Þ

EF ¼

!GT∩AS GT

ð17Þ

where GT and AS represent the number of voxels classified as WMLs by manual and automatic segmentation, respectively. GT ∩ AS indicates the number of voxels correctly classified as WMLs, i.e., the true positive (TP). !GT ∩ AS denotes the number of voxels falsely classified as WMLs, i.e., the FP. Thus, the OF and EF measure the percentages of WML areas that have been correctly and falsely classified by automatic segmentation relative to the GT obtained by manual segmentation, respectively. SI is the most extensively used overlap metric for measuring the agreement between automatic and manual segmentation. SI and OF mainly affect the TP in the result, whereas EF focuses on the percentage of FP in the

R. Wang et al. / Magnetic Resonance Imaging 32 (2014) 1321–1329

WMLs. All values of these three overlap metrics vary between 0 and 1. An SI value of 0.7 or higher indicates good segmentation [7]. Under this condition (SI ≥ 0.7), OF should be close to 1, and EF should be close to 0. Linear regression analysis and Bland–Altman analysis are also performed to measure the volumetric agreement between automatic and manual segmentation. 3. Results 3.1. Synthetic images GMM–EVT segmentation was performed on different MS phantoms from BrainWeb with variable levels of noise and inhomogeneity. Fig. 2 shows the segmentation result on the 103rd slice (3% noise, 20% inhomogeneity) by using the corresponding T1-w, T2-W, and PD-w images of the slice. The MS lesions were identified, as well as the three different brain tissues (WM, GM, and CSF). The SI values for MS lesion segmentation on the BrainWeb data were calculated based on the results of GMM–EVT segmentation and GT, as shown in Table 1. The results of GMM–EVT segmentation on the BrainWeb data were compared with the other MS lesion segmentation methods (Table 1), such as lesion topology-preserving anatomical segmentation (Lesion-TOADs) [15], whole brain trimmed-likelihood estimation (wTLE) [19], localized trimmed-likelihood estimation (lTLE) [21], constrained GMM and curve evolution (CGMM-CE) [22], and K. Van Leeput's algorithm (KVL) [18]. KVL, wTLE, and lTLE are outlier-based methods that demonstrate the lesions as outliers of a distribution. However, Lesion-TOADs and CGMM-CE model the lesions as a separate class and use different rulebased methods to identify the lesions. The image data used by wTLE, lTLE, and Lesion-TOADs are completely similar to the data used by our GMM–EVT segmentation. The SI values of the CGME-CE and KVL are computed on the slices from 60 to 120, and the image data are free of inhomogeneity. The results in Table 1 indicate good performance (SI N 0.7) of GMM–EVT segmentation on MS lesion detection by using the image data from BrainWeb. The accuracy (SI) of the segmentation

1325

Table 1 Comparison of the SI values for MS lesion detection between the GMM–EVT segmentation and other segmentation methods on the BrainWeb data with different levels of noise. Algorithm

GMM-EVT Lesion-TOADS [15] wTLE [19] lTLE [21] CGMM-CE [22] KVL [18]

Noise 1%

3%

5%

7%

9%

0.834 0.823 0.63 0.72 NAN NAN

0.821 0.812 0.79 0.77 0.79 0.80

0.802 0.792 0.65 0.73 0.79 0.63

0.775 0.729 0.23 0.49 0.78 0.61

0.729 NAN 0.04 0.20 0.76 0.47

SI values for the last five methods are reported by the Lesion-TOADS [15], wTLE [19], lTLE [21], CGMM-CE [22], and KVL [18], respectively. NAN indicates “not a number”, namely, the specific segmentation was not performed on the dataset with corresponding noise level.

decreases with the increasing level of noise. In general, the SI values corresponding to GMM–EVT segmentation are slightly higher than those of the other segmentation methods. 3.2. Clinical images 3.2.1. Segmentation results The synthetic data sets are less convincing as the real data sets are. Thus, GMM–EVT segmentation was also validated on the clinical images, including single modal (FLAIR) and multimodal (T1-w, T2-w, and FLAIR) MR images. Fig. 3 shows the results of WML segmentation using the FLAIR images and multimodal MR images. The manual segmentation of WMLs considered the GT and provided the comparison with GMM–EVT segmentation. Automatic segmentation using FLAIR images focuses only on detecting the lesions. By contrast, automatic segmentation using multimodal MR images detects WMLs along with other brain tissues (e.g., WM, GM, and CSF). On the basis of Fig. 3, the WMLs were overestimated when the multimodal MR images were used for the segmentation. Intriguingly, the WML

Fig. 2. An example of MS lesion segmentation (BrainWeb data, 3% noise, 20% inhomogeneity, 103rd slice). (A) T1-w image. (B) T2-w image. (C) PD-w image. (D) GT of MS lesions. (E) WM obtained by GMM–EVT segmentation. (F) Segmented GM. (G) Segmented CSF. (H) MS lesion detected by GMM–EVT segmentation.

1326

R. Wang et al. / Magnetic Resonance Imaging 32 (2014) 1321–1329

Fig. 3. Comparison of the GMM–EVT segmentation results on the FLAIR image and multimodal MR images for the same slice. (A) T1-w image. (B) T2-w image. (C) FLAIR image. (D) WMLs (yellow) detected by GMM–EVT segmentation on the multimodal MR images. (E) WMLs (green) detected by GMM–EVT segmentation on the FLAIR image. (F) Manual result (red) of WMLs. (G) Overlap map generated by combining the results of manual and GMM–EVT segmentation on the multimodal MR images. The TP (blue), true negative (TN, red), and FP (yellow) signals are illustrated on the multimodal MR images. (H) Overlap map generated by combining the results of manual and GMM–EVT segmentation on the FLAIR image. The TP (blue), TN (red), and FP (green) signals are illustrated on the FLAIR image. (I) The result of GMM–EVT segmentation on the multimodal MR images, including the WM (dodger blue), GM (yellow), CSF (light coral), and WMLs (indigo).

been demonstrated on clinical images from 86 patients with variable lesion loads. The comparison of our GMM–EVT segmentation algorithm with the other reported methods is shown in Table 3. Admiraal-Behloul [17] proposed an fuzzy c-means (FCM) segmentation method to detect WM hyperintensities (WMHs) using three different MR images: PD, T2-w, and FLAIR. Khayati [35] developed an automatic pipeline utilizing adaptive mixture method (AMM) and Markov random field (MRF) model to segment MS lesions on FLAIR images. GMM–TLE segmentation is implemented ourselves by improving existing work [19–21] for the WMH detection on FLAIR images. Table 3 shows that the performance of the GMM–EVT segmentation on real data is comparable to those of the other proposed methods.

volume was underestimated when the automatic segmentation was performed on the FLAIR images. 3.2.2. Overlap metrics comparison The overlap between the automatic and manual segmentation for the WMLs was measured by different overlap metrics. The mean values of SI, OF, and EF were calculated for the three subgroups classified based on the lesion load, as well as for the entire data set (Table 2). The results demonstrate higher accuracy (SI) and lower FP rate for the automatic segmentation performed on the FLAIR images than those of the segmentation using multimodal MR images. All SI values with regard to different lesion loads exceed the minimal accuracy requirement (SI N 0.7). Hence, excellent performance has

Table 2 Overlap statistics between GMM–EVT segmentation and manual segmentation on FLAIR and multimodal MR images. Lesion load

SI

OF

FLAIR Mild Moderate Severe All patients

0.78 0.80 0.85 0.80

± ± ± ±

0.09 0.07 0.05 0.08

Data are means ± standard deviations.

EF

Multimodal

FLAIR

0.76 0.78 0.86 0.78

0.76 0.77 0.82 0.77

± ± ± ±

0.08 0.07 0.04 0.08

± ± ± ±

0.09 0.08 0.06 0.09

Multimodal

FLAIR

0.78 0.80 0.86 0.80

0.18 0.14 0.11 0.15

± ± ± ±

0.10 0.09 0.05 0.09

± ± ± ±

Multimodal 0.14 0.07 0.06 0.12

0.29 0.25 0.15 0.25

± ± ± ±

0.15 0.10 0.06 0.13

R. Wang et al. / Magnetic Resonance Imaging 32 (2014) 1321–1329 Table 3 Comparison of the SI values for WML detection between GMM–EVT segmentation and other segmentation methods on clinical MR images. Methods

Mild

Moderate

Severe

All patients

GMM-EVT (FLAIR) GMM-EVT (Multimodal) GMM-TLE [19,21] Khayati et al. [35] Admiraal-Behloul et al. [17]

0.78 0.76 0.75 0.7253 0.70

0.80 0.78 0.83 0.7520 0.75

0.85 0.86 0.85 0.8096 0.82

0.80 0.78 0.79 0.7504 0.75

The first three methods, namely the GMM-EVT (FLAIR), GMM-EVT (Multimodal), and GMM-TLE were implemented ourselves. SI values for the last two methods are cited from Khayati et al. [35] and AdmiraalBehloul et al. [17], respectively.

3.2.3. Volumetric analysis The volumetric agreement between the automatic and manual segmentation was assessed by linear regression and Bland–Altman analysis [38,39], as shown in Fig. 4. For the segmentation using FLAIR images, the obtained correlation coefficient (R 2 = 0.9886, p b 0.01) indicates a significant correlation between the automatic and manual segmentation (Fig. 4A). Bland–Altman analysis (Fig. 4B) shows a systematic bias of 1.52 cc and an SD of 2.13 cc within the range of 1.86–57.14 cc. For the segmentation using multimodal MR images, the obtained correlation coefficient (R 2 = 0.9844, p b 0.05) demonstrates a strong correlation between the automatic and manual segmentation (Fig. 4C). Bland–Altman analysis (Fig. 4D) shows a systematic bias of − 0.47 cc and an SD of 2.04 cc within the range of 1.86–66.58 cc. Furthermore, the results of volumetric comparison between the automatic and manual segmentation are presented in Table 4. The automatic segmentation using FLAIR images obtained a lower volume of WML than that of the manual

1327

segmentation. By contrast, the automatic segmentation using multimodal MR images measured a higher WML volume than that of the manual segmentation. 4. Discussion In this paper, a fully automatic segmentation method has been developed and evaluated for detecting WMLs on MR images. The normal tissues in the brain are modeled by GMM; WMLs are considered outliers to the normal-appearing brain tissue model and are identified using EVT. The synthetic and clinical images were used to evaluate the performance of GMM–EVT segmentation. The SI values were calculated based on the results of the automatic and manual segmentation, demonstrating the good performance (SI N 0.7) of GMM–EVT segmentation when detecting WMLs on different types of MR images. The use of synthetic images benefits the evaluation of WML segmentation in two aspects. First, the availability of GT simplifies the process of validation. Second, GT provides a common framework for comparing GMM–EVT segmentation with other segmentation methods. The simulated experiments performed on the different MS phantoms (T1-w, T2-w, and PD-w) from BrainWeb demonstrate that the designed GMM–EVT segmentation for WML detection can be well employed to identify MS lesion. Moreover, GMM-EVT segmentation tends to be more robust to different levels of noises than the other segmentation methods. GMM-EVT segmentation was also validated on clinical MR images considering the effects of the imaging sequences and lesion loads. Regarding the imaging modalities, the segmentation performed on single modal MR images (FLAIR) obtained higher accuracy and a lower FP rate than the segmentation using multimodal MR

Fig. 4. Results of volumetric analysis. (A) Linear regression analysis of WML volumes obtained by manual and GMM–EVT segmentation on the FLAIR images. The blue line corresponds to the regression line, whereas the red line corresponds to the equality line. (B) Bland–Altman plot for the comparison between manual and GMM–EVT segmentation of WMLs on the FLAIR images. (C) Linear regression analysis of WML volumes obtained by manual and GMM–EVT segmentation on the multimodal MR images. The regression line (blue) and the equality line (red) are illustrated in the plot. (D) Bland–Altman plot for the comparison between manual and GMM–EVT segmentation of WMLs on the multimodal MR images.

1328

R. Wang et al. / Magnetic Resonance Imaging 32 (2014) 1321–1329

Table 4 Average volume (cc) of the WMLs detected by GMM–EVT segmentation with regard to different imaging sequences and lesion loads. Lesion load

FLAIR

Mild Moderate Severe All patients

5.61 17.91 42.95 16.13

± ± ± ±

2.35 4.71 9.63 14.84

Multimodal

Ground truth

6.34 20.76 47.16 18.12

6.05 19.96 46.69 17.66

± ± ± ±

2.50 4.84 11.07 16.30

± ± ± ±

2.41 5.25 10.43 16.17

Volumes are means ± standard deviations.

images (T1-w, T2-w, and FLAIR) did. This can be well explained by the theory of outlier detection using EVT. Clifton [23] mentioned in his study that the classical EVT [24] employed in our study correctly estimated the Gumbel parameters (μm, Σm) for the univariate case (e.g., single FLAIR sequence), whereas the location parameter (μm) in Gumbel was underestimated for the multivariate case (e.g., multimodal MR images). That is, the location parameter (μm) estimated for the segmentation using multispectral sequences (T1-w, T2-w, and FLAIR) was smaller than that for the segmentation using single FLAIR sequence. More signals including the FP signals would be considered the WMLs with decreasing of μm, thereby inducing more FP signals for the segmentation using multimodal MR images. By contrast, the precise determination of the location parameter (μm) for segmentation using FLAIR images accurately distinguished the WMLs from the artifacts and thus resulted in a better WML segmentation. An issue worth considering is the “transient black holes” [7], which are lesions presenting hyperintense on T2-w and FLAIR images but hypointense on T1-w images. The “transient black holes” are often associated with axonal loss and make the lesions seems larger on T2-w/FLAIR than those on T1-w images. This type of lesions can be few in number but hard to detect on MR images. We did not find obvious “transient black holes”. However, if the “transient black holes” do exist, these lesions can be a possible factor which influences the performance of GMM-EVT segmentation using single mode FLAIR images or multimodal MR images to a limited extent. Although the use of FLAIR images can achieve better segmentation of WMLs, the segmentation on multimodal MR images can detect different normal brain tissues (WM, GM, and CSF) along with the lesions, which may help radiologists to quantify the GM volume in the study of dementia. The effect of the lesion load to the performance of GMM–EVT segmentation is also considered. Intriguingly, patients with severe lesion load obtained better segmentation results than those with mild lesion load. The major reasons are the tendency of mild lesions to be susceptible to the effects of vague borders and their low contrast with the normal tissues. Therefore, the performance of GMM–EVT segmentation is acceptable on clinical MR images with different lesion loads (the average SI values N 0.7). Besides, the GMM-EVT segmentation has other advantages such as short time-consuming and high efficiency. The computation time for a single slice was about 2.2 s. For a complete segmentation task on a certain patient, the computation time can change with the number of the MR slices containing WMLs, but not with the lesion load. The reason is that the number of the MR slices containing WMLs is not dependent on the lesion loads. In comparison with the other segmentation methods for WML detection, GMM–EVT has the following advantages. First, GMM–EVT segmentation is fully automated and requires no anatomical atlas or training database. Moreover, the atlas-based segmentation methods [15] need to construct a pre-segmented atlas and perform non-rigid registration to match the anatomical atlas to the MR images to be segmented. Similarly, the trained data set utilized in supervised segmentation [12–14,40] also needs to be segmented previously and suffers from the drawback of inter- or intra-variability. In addition,

GMM–EVT segmentation is flexible with no restrictions on the imaging parameters, such as imaging sequences, as well as brightness and contrast of the images. Nevertheless, rule-based segmentation methods [41], particularly threshold-based methods, are sensitive to the contrasts of the images, thereby limiting the segmentation to specific lesions. As mentioned in previous sections, the GMM–EVT segmentation framework uses a GMM to acquire a model of normal brain tissues and to detect WMLs by EVT. In fact, EVT is well behaved when modeling the behavior of the maximum or minima in a data set. Thus, EVT is a well-accepted outlier detection method, which concerns the abnormal samples occurring in the tail of a distribution (e.g., GMM distribution). The EVT-based outlier detection method is more principled than the other proximity-based threshold protocols. The proximity-based threshold approaches [18,19] detect an observation as an outlier if it exceeds a threshold Euclidean or Mahalanobis distance from the center of the normal class. EVT uses a reasonable heuristic approach to determine its threshold by comparing the PDF of the two statistical models; thus, the process of outlier detection becomes more flexible and efficient. The primary sources of errors in GMM–EVT segmentation are directly linked to the FP signals caused by the bony artifacts and flow artifacts. We applied a morphological operation to eliminate the effect of FP in the lesions. Searching for an optimum FP minimization method is difficult because FP signals can occur anywhere in the brain. Ong [42] proposed an FP minimization method in which close operation is performed on T1-w images to remove the non-overlapping FP voxels in the WM region. However, this method requires non-rigid registration on multimodal MR images and thus results in arduous segmentation. The intensity inhomogeneity correction was also performed using N3 inhomogeneity correction [29] module in MIPAV software to improve the segmentation performance. The N3 inhomogeneity correction is a robust, accurate, and fully automatic method which makes no assumptions about the kind of the anatomy present in a scan. Thus, the process of intensity inhomogeneity correction would enhance the image quality without bringing grayscale distortion. As a whole, the FP rates associated with GMM–EVT segmentation are maintained at a relatively low level. In conclusion, GMM–EVT segmentation is an accurate and efficient method to detect WML on elderly patients. GMM–EVT is a promising computer-aided diagnosis tool for large cohort study of WMLs in aging and dementia. Future work will be performed to further promote the accuracy and reduce the FP rates of GMM–EVT segmentation. Acknowledgments This research is supported by National Basic Research Program of China (973 Program, no. 2010CB732506), National Natural Science Foundation of China (no. 81301213), National Natural Science Foundation of China (no. 81000609), National Natural Science Foundation of China (no. 60972110), and Major Program of Social Science Foundation of China (no. 11&ZD174). We thank the radiologists of Shanghai Sixth People's Hospital for making their clinical image and ground truth data. References [1] De Groot JC, Oudkerk M, Gijn JV, Hofman A, Jolles J, Breteler M. Cerebral white matter lesions and cognitive function: the Rotterdam Scan Study. Ann Neurol 2000;47:145–51. [2] Pantoni L, Garcia JH. Pathogenesis of leukoaraiosis: a review. Stroke 1997;28: 652–9. [3] De Groot JC, De Leeuw FE, Oudkerk M, Van Gijn J, Hofman A, Jolles J, et al. Periventricular cerebral white matter lesions predict rate of cognitive decline. Ann Neurol 2002;52:335–41. [4] Prins ND, van Dijk EJ, den Heijer T, Vermeer SE, Koudstaal PJ, Oudkerk M, et al. Cerebral white matter lesions and the risk of dementia. Arch Neurol 2004;61: 1531–4.

R. Wang et al. / Magnetic Resonance Imaging 32 (2014) 1321–1329 [5] Whitman G, Tang T, Lin A, Baloh R. A prospective study of cerebral white matter abnormalities in older people with gait dysfunction. Neurology 2001;57:990–4. [6] De Groot JC, De Leeuw F-E, Oudkerk M, Hofman A, Jolles J, Breteler MM. Cerebral white matter lesions and depressive symptoms in elderly adults. Arch Gen Psychiatry 2000;57:1071–6. [7] García-Lorenzo D, Francis S, Narayanan S, Arnold DL, Collins DL. Review of automatic segmentation methods of multiple sclerosis white matter lesions on conventional magnetic resonance imaging. Med Image Anal 2013;17:1–18. [8] Lladó X, Oliver A, Cabezas M, Freixenet J, Vilanova JC, Quiles A, et al. Segmentation of multiple sclerosis lesions in brain MRI: a review of automated approaches. Inform Sci 2012;186:164–85. [9] Antithrombotic Trialists' Collaboration. Collaborative meta-analysis of randomised trials of antiplatelet therapy for prevention of death, myocardial infarction, and stroke in high risk patients. BMJ 2002;324:71–86. [10] Debette S, Markus HS. The clinical importance of white matter hyperintensities on brain magnetic resonance imaging: systematic review and meta-analysis. BMJ 2010;341:c3666. [11] Wu Y, Warfield SK, Tan IL, Wells III WM, Meier DS, van Schijndel RA, et al. Automated segmentation of multiple sclerosis lesion subtypes with multichannel MRI. Neuroimage 2006;32:1205–15. [12] Anbeek P, Vincken KL, van Osch MJ, Bisschops RH, van der Grond J. Probabilistic segmentation of white matter lesions in MR imaging. Neuroimage 2004;21:1037–44. [13] Song T, Jamshidi MM, Lee RR, Huang M. A modified probabilistic neural network for partial volume segmentation in brain MR image. IEEE Trans Neural Netw 2007;18:1424–32. [14] Lao Z, Shen D, Liu D, Jawad AF, Melhem ER, Launer LJ, et al. Computer-assisted segmentation of white matter lesions in 3D MR images using support vector machine. Acad Radiol 2008;15:300–13. [15] Shiee N, Bazin P-L, Ozturk A, Reich DS, Calabresi PA, Pham DL. A topologypreserving approach to the segmentation of brain images with multiple sclerosis lesions. Neuroimage 2010;49:1524–35. [16] Schmidt P, Gaser C, Arsic M, Buck D, Förschler A, Berthele A, et al. An automated tool for detection of FLAIR-hyperintense white-matter lesions in multiple sclerosis. Neuroimage 2012;59:3774–83. [17] Admiraal-Behloul F, Van Den Heuvel D, Olofsen H, Van Osch M, Van Der Grond J, Van Buchem M, et al. Fully automatic segmentation of white matter hyperintensities in MR images of the elderly. Neuroimage 2005;28:607–17. [18] Van Leemput K, Maes F, Vandermeulen D, Colchester A, Suetens P. Automated segmentation of multiple sclerosis lesions by model outlier detection. IEEE Trans Med Imaging 2001;20:677–88. [19] García-Lorenzo D, Prima S, Arnold DL, Collins DL, Barillot C. Trimmed-likelihood estimation for focal lesions and tissue segmentation in multisequence MRI for multiple sclerosis. IEEE Trans Med Imaging 2011;30:1455–67. [20] Neykov N, Filzmoser P, Dimova R, Neytchev P. Robust fitting of mixtures using the trimmed likelihood estimator. Comput Stat Data Anal 2007;52:299–308. [21] Galimzianova A, Špiclin Ž, Likar B, Pernuš F. Automated segmentation of MS lesions in brain MR images using localized trimmed-likelihood estimation. Medical Imaging: Image Processing, 8669. Proceeding of SPIE; 2013. p. 86693E1–7.

1329

[22] Freifeld O, Greenspan H, Goldberger J. Multiple sclerosis lesion detection using constrained GMM and curve evolution. Int J Biomed Imaging 2009;2009:715124. [23] Clifton DA, Hugueny S, Tarassenko L. Novelty detection with multivariate extreme value statistics. J Sig Proc Syst 2011;65:371–89. [24] Roberts SJ. Extreme value statistics for novelty detection in biomedical data processing. IEE Proc Sci Meas Technol 2000;147:363–7. [25] MacDonald A, Scarrott CJ, Lee D, Darlow B, Reale M, Russell G. A flexible extreme value mixture model. Comput Stat Data Anal 2011;55:2137–57. [26] Cocosco CA, Kollokian V, Kwan RK-S, Pike GB, Evans AC. Brainweb: online interface to a 3D MRI simulated brain database. Neuroimage 1997;5:S425. [27] Collins DL, Zijdenbos AP, Kollokian V, Sled JG, Kabani NJ, Holmes CJ, et al. Design and construction of a realistic digital brain phantom. IEEE Trans Med Imaging 1998;17:463–8. [28] Gibson E, Gao F, Black SE, Lobaugh NJ. Automatic segmentation of white matter hyperintensities in the elderly using FLAIR images at 3 T. J Magn Reson Imaging 2010;31:1311–22. [29] Sled JG, Zijdenbos AP, Evans AC. A nonparametric method for automatic correction of intensity nonuniformity in MRI data. IEEE Trans Med Imaging 1998; 17:87–97. [30] Smith SM. Fast robust automated brain extraction. Hum Brain Mapp 2002;17: 143–55. [31] Thirion J-P. Image matching as a diffusion process: an analogy with Maxwell's demons. Med Image Anal 1998;2:243–60. [32] Bishop CM. Pattern recognition and machine learning. New York: Springer; 2006. [33] Kotz S, Nadarajah S. Extreme value distributions. London: Imperial College Press; 2000. [34] De Haan L, Ferreira A. Extreme value theory: an introduction. New York: Springer; 2006. [35] Khayati R, Vafadust M, Towhidkhah F, Nabavi M. Fully automatic segmentation of multiple sclerosis lesions in brain MR FLAIR images using adaptive mixtures method and Markov random field model. Comput Biol Med 2008;38:379–90. [36] Dice LR. Measures of the amount of ecologic association between species. Ecology 1945;26:297–302. [37] Stokking R, Vincken KL, Viergever MA. Automatic morphology-based brain segmentation (MBRASE) from MRI-T1 data. Neuroimage 2000;12:726–38. [38] Altman DG, Bland JM. Measurement in medicine: the analysis of method comparison studies. Statistician 1983;32:307–17. [39] Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res 1999;8:135–60. [40] Anbeek P, Vincken KL, van Osch MJ, Bisschops RH, van der Grond J. Automatic segmentation of different-sized white matter lesions by voxel probability estimation. Med Image Anal 2004;8:205–15. [41] Datta S, Sajja BR, He R, Gupta RK, Wolinsky JS, Narayana PA. Segmentation of gadolinium-enhanced lesions on MRI in multiple sclerosis. J Magn Reson Imaging 2007;25:932–7. [42] Ong KH, Ramachandram D, Mandava R, Shuaib IL. Automatic white matter lesion segmentation using an adaptive outlier detection method. Magn Reson Imaging 2012;30:807–23.