Automatic Segmentation and Quantitative Analysis of White Matter Hyperintensities on FLAIR Images Using Trimmed-Likelihood Estimator

Automatic Segmentation and Quantitative Analysis of White Matter Hyperintensities on FLAIR Images Using Trimmed-Likelihood Estimator

Original Investigations Automatic Segmentation and Quantitative Analysis of White Matter Hyperintensities on FLAIR Images Using Trimmed-Likelihood Es...

2MB Sizes 3 Downloads 117 Views

Original Investigations

Automatic Segmentation and Quantitative Analysis of White Matter Hyperintensities on FLAIR Images Using Trimmed-Likelihood Estimator Rui Wang, PhD, Chao Li, PhD, Jie Wang, BS, Xiaoer Wei, MD, Yuehua Li, MD, Chun Hui, PhD, Yuemin Zhu, PhD, Su Zhang, PhD Rationale and Objectives: Quantitative analysis of white matter hyperintensities (WMHs) on fluid-attenuated inversion recovery (FLAIR) images provides information for disease tracking, therapeutic efficacy assessment, and cognitive science research. This study developed an automatic segmentation method to detect and quantify WMHs on FLAIR images. This study aims to assess the accuracy and reproducibility of the proposed method. Materials and Methods: The FLAIR images of 82 patients (58–84 years) with different WMH burdens were acquired with the same 3T scanner (Intera-achieva SMI-2.1; Philip Medical System, Sixth Affiliated People’s Hospital, Shanghai, China). The FLAIR images were preprocessed through brain extraction and intensity inhomogeneity correction. An anatomy atlas built from a set of 20 patients with different WMH burdens (mild, 11 patients; moderate, 6 patients; and severe, 3 patients) was used to estimate a control parameter in the framework of segmentation. The general flow for WMH segmentation included classification of foreground and background regions, detection of abnormally high signals, and WMH refinement. The performance of automatic segmentation was evaluated by a volumetric comparison with manual segmentation on patients with different WMH burdens. Results: Similarity index values for the accuracy of automatic segmentation compared to manual segmentation were consistently high for patients with different WMH burdens (mild, 0.78  0.08; moderate, 0.83  0.06; severe, 0.84  0.08; and total, 0.80  0.08). Linear regression demonstrated a strong correlation between the WMH volumes measured by the two methods in all patients (r = 0.98, P = .006). Small average differences were detected between the WMH volumes obtained through manual and automatic segmentations (mild, 4.76%; moderate, 6.84%; and severe, 7.59%). The results of Bland–Altman analysis show a system bias of 0.68 cm3 and a standard deviation of 2.10 cm3 over the range of 2.58–53.9 cm3. Conclusions: The developed method is accurate and efficient in detecting and quantifying differently sized WMHs on FLAIR images. Automatic segmentation is a promising computer-aided diagnostic tool to study WMHs in aging and dementia in basic research and even in clinical trials. Key Words: White matter hyperintensities; Gaussian mixture model; trimmed likelihood estimator; automatic segmentation. ªAUR, 2014

W

Acad Radiol 2014; -:1–12 From the School of Biomedical Engineering and Med-X Research Institute, Shanghai Jiao Tong University, Room 123, 3 Teaching Building, No. 1954, Huashan Rd, Shanghai 200030, China (R.W., C.L., J.W., C.H., S.Z.); Institute of Diagnostic and Interventional Radiology, Sixth Affiliated People’s Hospital, Shanghai Jiao Tong University, Shanghai, China (X.W., Y.L.); and CREATICS, CNRS UMR 5220, Inserm 1044, INSA Lyon, Villeurbanne, France (Y.Z.). Received February 28, 2014; accepted July 7, 2014. Funding Sources: This research is supported by National Basic Research Program of China (973 Program, No. 2010CB732506), National Natural Science Foundation of China (No. 81301213, 81000609, and 60972110), and Major Program of Social Science Foundation of China (No. 11&ZD174). Address correspondence to: S.Z. e-mail: [email protected] ªAUR, 2014 http://dx.doi.org/10.1016/j.acra.2014.07.001

hite matter hyperintensities (WMHs) are focal or diffuse lesions of high signals commonly found in the cerebral white matter (WM) on T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) magnetic resonance (MR) images (1). The pathologic mechanism of WMHs remains unclear, but the lesions are suggested to be associated with age, demyelination, gliosis, and stroke (2). The typical clinical manifestations of WMHs include cognitive dysfunctions, movement disorder, and depressive symptoms (3). Both T2-w sequences and FLAIR are used to detect WMHs. FLAIR sequence exhibits a better effect than T2-w images when imaging WMHs near cerebrospinal fluid (CSF) spaces because it suppresses high CSF signals by adopting a long inversion time (4). Moreover, the contrast between WM and gray matter (GM) is reduced on FLAIR images for the elderly population, thereby producing a 1

WANG ET AL

homogeneous low background signal and making WMHs prominent (4). Accurate detection of WMHs contributes to measuring the number and volume of lesions for disease tracking, therapeutic efficacy assessment, and cognitive science research. WMHs correlate with an increased risk of stroke, dementia, and death (2,5,6). The issue of whether different WMH volumes are associated with cognitive dysfunctions and movement disorder has been discussed widely and constantly (1,3,7). Qualitative and quantitative analyses of MR images have been used to assess the lesion load of these signal abnormalities. Qualitative analysis is performed by an experienced radiologist using different visual rating scales, but the results are often affected by subjective factors and ceiling effects (8,9). Quantitative analysis methods, including manual and automatic segmentation, provide information on WMH volume (10–14). Although manual segmentation (15) is the gold standard for validating other segmentation methods, this technique is labor intensive and time consuming. Automatic segmentation is based on machine learning and pattern recognition technique; it combines various feature selections and classification methods to detect WMHs accurately and effectively. Fuzzy connectedness (16–18) and thresholdbased technique are commonly used in different automatic segmentation methods. Automatic segmentation is completely reproducible, whereas manual segmentation often suffers from intra-and inter-expert variability (15). Accurate detection of WMHs is difficult because of the variability in lesion locations, sizes, and shapes. Instead of directly modeling the lesions, considering the lesions as outliers to normal-appearing brain tissues is a common approach in lesion segmentation. In practice, Gaussian mixture models (GMM) are commonly used to simulate the distribution of intensities in healthy brain MR images. WMHs can be identified as GMM outliers (19). Maximum likelihood estimation (MLE) with the expectation–maximization (EM) algorithm is frequently used to estimate unknown parameters of GMM; nevertheless, MLE is not robust to outliers that exist in data in accord with GMM (20,21). Therefore, the trimmed-likelihood estimator (TLE) (22) has been proposed to detect outliers in the GMM. Some studies (23–25) have applied the TLE to detect noise and intensity inhomogeneity with known proportions on simulated MR images from BrainWeb (26). However, how to adjust the TLE to model unknown proportions of WMHs on real FLAIR images is a meaningful but hard work. This study aimed to develop an automatic WMHs segmentation framework and to quantitatively analyze WMH volume on FLAIR images based on a TLE–EM segmentation framework. An anatomic atlas was used to estimate a control parameter used in automatic segmentation, and a morphological method was applied to reduce number of the false-positive (FP) voxels in the final results. The WMH volumes obtained through manual and automatic segmentation were compared on patients with different WMH burdens to validate the performance of the proposed automatic method.

2

Academic Radiology, Vol -, No -, - 2014

MATERIALS AND METHODS Subjects and Image Acquisition

The study involved data from MR images of 82 patients diagnosed with WMHs between March 2011 and April 2013. Patients were aged between 58 and 84 years (69.6  7.6). This study was approved by the Institutional Review Board. Written informed consent was obtained from all patients. MR images of the brain were acquired on a 3.0 T MRI scanner (Intera-achieva SMI-2.1; Philip Medical System). The same MR protocol of the brain consisting of T1-weighted (T1-w), T2-w, and FLAIR scans was used on all patients. The FLAIR images were acquired with the following: repetition time/echo time/inversion time (TR/ TE/TI) = 11,000/120/2800 milliseconds; flip angle = 90 , field of view = 280 mm  280 mm; slice thickness = 5 mm; rows  columns = 640  640; and voxel size = 5  0.4375  0.4375 mm3. FLAIR images with WMHs were mainly captured from deep, periventricular, subcortical, and cortical WM samples because the WMHs mainly occurred at these places. To enhance the efficiency of automatic segmentation, the slices without WMH signal were deleted. The remaining continuous slices containing WMH signals were reserved for automatic segmentation. Data Preparation

A neurologist and a radiologist manually segmented the WMH signals on the FLAIR images of the 82 patients. They referred to the corresponding T1-w and T2-w images. The two groups of results obtained by manual segmentation were compared, and the better one was chosen for evaluating our segmentation performance. The binary results of the manual segmentation were considered the ground truth in assessing the accuracy of automatic segmentation. The problem of WMH burden should be considered because the segmentation performance may differ depending on the different lesion loads. Thus, we classified the patients into three categories based on different lesion volume (LV) of WMHs as obtained by manual segmentation (27): mild (LV <10 cm3; 44 patients), moderate (10 cm3 < LV < 30 cm3; 26 patients), and severe (LV >30 cm3; 12 patients). The precategorized data are used to compare the performance of automatic segmentation given patients with different WMH burdens. The partial results of manual segmentation were chosen and applied to construct an anatomic atlas and to estimate the specific parameters used in automatic segmentation. Image Preprocessing

The FLAIR images were preprocessed through intensity inhomogeneity correction and brain extraction. These two processes can be achieved using available free software packages. Intensity inhomogeneity correction was performed by the N3 inhomogeneity correction (28) module (iterations = 100,

Academic Radiology, Vol -, No -, - 2014

AUTOMATIC SEGMENTATION OF WMHS ON FLAIR IMAGES

Figure 1. (a) FLAIR image of WMHs in the periventricular white matter. (b) Gray-level histogram of the FLAIR image. The horizontal axis represents the range of the gray-scale distribution, and the vertical axis is the logarithmic number of voxels with a specific gray level. The left tail in the gray histogram corresponds to the peripheral area in FLAIR images with a gray level of 0. The right tail in the gray histogram indicates areas (WMH, skull, and scalp) with abnormally high signals on the FLAIR image. (c) And (d) are pseudocolor images and correspond to color histograms. The color information simplifies locating a specific tissue on the FLAIR images in the statistical histogram. FLAIR, fluid-attenuated inversion recovery; WMH, white matter hyperintensity.

end tolerance = 0.0001) in MIPAV software (http://mipav.cit. nih.gov/) to reduce the influence of inhomogenous static or applied magnetic fields within the scanner. FSL’s Brain Extraction Tool (29) was used to extract the brain from the FLAIR images; then, an image binarization operation was performed to convert the brain images into a binary brain template. The binary brain template would be combined with WMH candidate regions detected by automatic segmentation to remove undesired non–brain tissue, including skull and scalp regions. Segmentation Method

The FLAIR images could be divided into two regions (foreground and background) based on the brightness and contrast of different brain tissues. The foreground region is the bright signal area that consists of WM, GM, WMH, and non–brain tissues (skull and scalp), whereas the background region is the dark signal area that includes low CSF signal and black background. The signal intensities of WMHs and non–brain tissues were significantly greater than those of the surrounding areas (WM and GM) in the foreground region. Thus, WMHs and non–brain tissues have abnormally high signals. The graylevel and color histograms of the FLAIR images are illustrated in Figure 1. In the gray histogram, the left tail corresponds to the peripheral area on the FLAIR images, with a gray level of

0, whereas the right tail indicates areas (eg, the WMH, skull, and scalp) with abnormally high signals. The pseudocolor image and the color histogram were designed to clarify the positional correspondence of specific tissue on the twocolor graphics. Automatic segmentation of WMHs was performed based on the following steps: 1) classification of foreground and background regions based on a two-class global EM algorithm, 2) Detection of abnormally high signals in the foreground region by a partial EM algorithm combined with TLE. 3) WMH refinement. The flow diagram of the automatic WMH segmentation protocol is shown in Figure 2. Classification of Foreground and Background Regions

The EM algorithm divided the FLAIR images into the foreground and background regions. The gray-scale distribution of the two regions was modeled using a two-component GMM (20). In the FLAIR images, the corresponding probability density function (PDF) for each voxel yi was expressed as a linear superposition of Gaussians in the form: k    X  4 y i ; xi ¼ p j j y i ; xi ; m j ; s j

(1)

j¼1

3

WANG ET AL

Academic Radiology, Vol -, No -, - 2014

Figure 2. Flow diagram of automatic WMH segmentation. EM, expectation– maximization; FLAIR, fluid-attenuated inversion recovery; TLE, trimmedlikelihood estimator; WMH, white matter hyperintensity.

where j (yi; xi, mj, sj) denotes the PDF of a Gaussian distribution with unknown parameters, including the mean mj, standard deviation sj, and proportion of the jth classifier pj. Each Gaussian in GMM provides a probabilistic model for a specific tissue class in the FLAIR images. xj is a discrete label that represents the classification of voxel yi with respect to the two tissue classes (k = 2), namely, foreground and background. In the EM algorithm, unknown parameters must be properly initialized using the Otsu method (30). Specifically, we first used the Otsu method to classify the FLAIR images into two parts, namely, the foreground and background regions. ð0Þ ð0Þ mj and sj were then initialized using the mean values and standard deviations of these preclassified regions. Furthermore, we computed the proportions of the foreground and background regions with respect to the images as a whole. These proportions were used to initialize the parameters ð0Þ pj . With these initial starting values, the EM algorithm estimates MLE parameters by iteratively performing expectation (E) and maximization (M) steps. The former step creates an expectation function of log likelihood using the estimated unknown parameters. The latter step estimates the unknown parameters and maximizes the expectation function (31). The final GMM was estimated by MLE with the EM algorithm. The probabilities of voxel yi assigned to the foreground and background were calculated based on Bayes posterior probability in the EM algorithm. The brain voxels were finally classified into the foreground and background regions. The corresponding PDF 4 (yi) for each voxel yi was stored until used to detect abnormally high signals on the FLAIR images. The results obtained after the classification of the foreground

4

(white) and background (black) regions are shown in Figure 3b. Detection of Abnormally High Signals

The abnormally high signals in a FLAIR image correspond to outliers existing in data in accord with the statistical distribution of GMM. The set of unknown parameters was denoted as q = {mj, sj, pj}, and the GMM for data mixed with outliers was estimated by TLE as follows: b q TLE :¼ arg max q˛Q

XnT i¼1

  f yvðiÞ ; q

(2)

where f ðyi ; qÞ ¼ log 4ðyi ; qÞ is the logarithmic value of PDF for voxel yi in a FLAIR image and f ðyvð1Þ ; qÞ$f ðyvð2Þ ; qÞ$,,,$f ðyvðnT Þ ; qÞ. The corresponding permutation of the indices is represented as v = [v(1), ., v(nT)], which sorts all voxels of the FLAIR images according to the values of their probability f(yv(i);q). The number of voxels for normal tissues was calculated using the trimming parameter nT = n  (1  h), where n is the denoted total number of voxels in FLAIR images and h represents the proportion of abnormal high signals to the FLAIR images. The detection of abnormally high signals on FLAIR images was divided into two stages: parameter h estimation and TLE-EM segmentation. Stage 1: Estimation of Parameter h. The proportion of h was estimated using an anatomic atlas, which was constructed based on a set of 20 patients under different WMH burdens

Academic Radiology, Vol -, No -, - 2014

AUTOMATIC SEGMENTATION OF WMHS ON FLAIR IMAGES

Figure 3. (a) Brain extraction with the standard software BET. The signals of skull and scalp were eliminated, producing a clean brain template to be used in the following WMH refinement step. (b) Results obtained after classification of foreground (white) and background (black) regions. (c) Abnormally high signals consisting of WMH, skull, and scalp detected based on EM-TLE segmentation. WMHs are displayed in red color. The green signal indicates skull and scalp in the brain to be eliminated in the following WMH refinement step. (d) Final segmentation result obtained after the WMH refinement step. EM, expectation–maximization; TLE, trimmed-likelihood estimator; WMH, white matter hyperintensity.

(mild, 11 patients; moderate, 6 patients; and severe, 3 patients). The process of building the anatomic atlas conformed to the principle of random sampling. WMHs and non–brain tissues corresponded to abnormally high signals on the FLAIR images (Fig 3c). The average proportions of WMHs and non– brain tissues were calculated based on the results of manual segmentation. The average proportion of WMHs was estimated to be 0.018  0.007 by dividing the number of voxels in WMHs by those in the whole images. The average proportions of non–brain tissues were calculated to be 0.017  0.006 by subtracting the proportion of the binary brain template image from that of the foreground region detected by EM segmentation. Finally, h (z0.035) was obtained by adding the proportions of WMHs and non–brain tissue. Stage 2: TLE-EM Segmentation. The presence of abnormally high signals on FLAIR images was detected through TLE-EM segmentation. This segmentation method detected abnormally high signals as outliers using TLE in data following a GMM distribution. In fact, TLE could be estimated mathematically using a simplified version of the FAST-TLE algorithm (22), an iteration of data purification and model refinement. In data purification, f(yi;q) was firstly calculated for all the voxels on the FLAIR images based on the GMM previously estimated. Then, the f(yi;q) of all the voxels was ranked in descending order to obtain an ordered queue of voxels. The last n  h voxels at the end of the ordered queue corresponded to abnormal high signals on the FLAIR images. The remaining n  (1  h) voxels corresponded to normal tissue. In model refinement step, the remaining n  (1  h) voxels were used to estimate a new GMM with the EM algorithm. The latest GMM would be used in the subsequent data purification. Data purification and the model refinement were repeated until convergence was reached. Finally, the n  h voxels acquired from the data purification of the last iteration corresponded to the final abnormally high signals. To clarify the algorithm, we present it in a systematic and formal series of steps. These steps are divided into two parts as follows:

Part I: Two-class global EM algorithm. ð0Þ ð0Þ ð0Þ Step 1: The starting values (mj , sj , and pj ) are initialized using the Otsu method. Step 2: E step. Using the current parameter q as obtained in the last M step, an expectation function of log likelihood is generated. The posterior probability of voxel yi, which is assigned to the foreground and background regions, is evaluated. Step 3: M step. In the current iteration, the parameter q is estimated using the posterior probabilities of all voxels as obtained in the E step. Moreover, the expectation function is maximized. Step 4: The convergence of the global EM algorithm is validated. If the convergence criterion is not satisfied, step 2 is repeated. Otherwise, the values of PDF for all voxels in the FLAIR images are stored. Part II: Partial EM algorithm combined with TLE. Step 5: The logarithmic values of the probability density (f ðyi ; qÞ ¼ log 4ðyi ; qÞ) are calculated for all the voxels in the FLAIR images. These values are then sorted in descending order. Step 6: The parameter h is estimated. Step 7: The first n  (1  h) voxels are used to build a new partial GMM through the EM algorithm. Step 8: The convergence of the log likelihood of the partial EM algorithm is verified. If the convergence criterion is not satisfied, step 7 is repeated. Otherwise, the segmentation process is terminated, and the last n  h voxels are the candidate WMHs. WMHs Refinement

WMH refinement was performed to eliminate undesired non-WMH tissues, reduce the incidence of FP signals, and obtain clean WMH signals. The detected abnormally high signals corresponded to WMHs, non–brain tissues, and other FP signals (eg, isolated voxels, bony artifacts, and flow artifacts). In fact, possible lesions with sizes not more than two 5

Academic Radiology, Vol -, No -, - 2014

WANG ET AL

voxels were not generally considered as WMHs by radiologists in manual segmentation. Thus, an isolated voxel was considered a possible noise if its intensity was greater than the average intensities of its 3  3 neighbor voxels plus a margin of 25 (32). Non–brain tissues composed of the skull and scalp were removed based on the binary brain template through morphological operation. The left FP signals, which corresponded to isolated voxels, bony artifacts, flow artifacts, and some non–brain tissues not eliminated completely because of incomplete skull stripping, were removed by performing morphological dilation and erosion with a 3  3 structuring element. The results of the final WMHs obtained after the refinement step are shown in Figure 3d. Validation of Segmentation and Quantitative Measures

To evaluate the accuracy and consistency of automatic segmentation, the automatic segmentation results of WMH were compared to those of manual segmentation (ground truth). The results of both manual and automatic segmentation were converted into binary images. Specifically, the WMHs detected by either manual or automatic segmentation are presented in the image regions with a gray level of 255. The gray levels of the remaining normal tissues were set to 0. The binary segmentations were evaluated by three similarity measures: similarity index (SI) (33,34), false-positive rate (FPR) (35–37), and false-negative rate (FNR) (38). SI denotes the degree of coincidence between WMH regions detected by automatic segmentation and the ground truth region obtained by manual segmentation. FPR measures the percentage of WMH regions falsely classified by automatic segmentation relative to the ground truth obtained by manual segmentation. FNR measures the percentage of missed WMH voxels which were undetected by the automatic segmentation. The similarity metrics is defined as follows: 2  ðMXAÞ MþA

(3)

FPR ¼

!MXA M

(4)

FNR ¼

MX!A M

(5)

SI ¼

In these definitions, M and A denote the volumes of WMHs regions detected by the manual and automatic segmentations, respectively. MXA, which is used to compute SI, represents number of true positive (TP) voxels. !MXA, which is used to compute FPR, denotes the number of FP voxels. MX!A, which is used to compute FNR, denotes the number of false-negative (FN) voxels. The SI is a similarity metric that is most widely used to validate automatic segmentation. An SI value of $0.7 indicates that an automatic segmentation method has a good performance (15,35,36). 6

The SI should be close to 1, whereas FPR and FNR should be close to 0 indicating a small number of FP and FN in the final results. In addition, linear regression and Bland–Altman (39,40) analysis were also used to measure volumetric agreement between the manual and automatic segmentations. RESULTS Volumetric Comparison Between Manual and Automatic Segmentations

Automatic segmentation was performed on 82 patients using h z 0.035. Three slide-by-slide comparisons of WMHs detected by manual and automatic segmentations on patients with different WMH burdens are shown in Figure 4. The WMHs detected by the two methods were highly similar based on visual inspection. The performance of automatic segmentation was evaluated using three similarity measures: SI, FPR, and FNR. As shown in Table 1, the SI values for all patients with different WMH burdens calculated using the results of the manual and automatic segmentations were >0.7. Both SI and FNR increased but FPR decreased as the WMH volume increased. The result indicates that automatic segmentation has better performance on FLAIR images with large WMH volumes. Regression and Bland–Altman Analysis

The correlation analysis results of the WMH volumes derived from manual and automatic segmentations are shown in Table 2. Linear regression demonstrated a strong correlation between the WMH volumes measured by the two methods in all patients (mild, R = 0.91, P = .026; moderate, R = 0.92, P = .001; and severe, R = 0.93, P = .007). R was $0.9 for all patients with different WMH burdens. The volume difference (Vdif) was defined as Vdif ¼ jVAuto  Vmanual j=Vmanual , and the corresponding values of Vdif for patients with mild, moderate, and severe WMH burdens were 4.76%, 6.84%, and 7.59%, respectively. This result demonstrated that the errors between the two methods were small. Figure 5a shows a strong correlation between the WMH volumes measured by manual and automatic segmentations (R = 0.98; P = .006). The regression line (blue) was very close to the equality line (red), indicating a good volumetric agreement between the two methods. The Bland–Altman plot in Figure 5b shows the agreement between differences of the two measurements, known as bias, against the average of the two measurements. The plot shows a system bias of 0.68 cm3 and a standard deviation (SD) of 2.10 cm3 over the range of 2.58–53.9 cm3. Bland–Altman analysis was also performed on patients with mild, moderate, and severe WMHs (bias: 0.31, 1.28, and 3 cm3; SD: 0.89, 1.98, and 3.11 cm3, respectively). Results indicated that compared to manual segmentation, automatic segmentation obtained larger WMH volumes for patients with mild WMH burden but obtained smaller WMH volumes for patients with moderate and severe WMH burdens.

Academic Radiology, Vol -, No -, - 2014

AUTOMATIC SEGMENTATION OF WMHS ON FLAIR IMAGES

Figure 4. Automatic versus manual segmentation in patients with mild (a), moderate (b), and severe (c) WMH burdens. First column: original FLAIR images. Second column: results (red signals) of automatic segmentation. Third column: results (blue signals) of manual segmentation. Fourth column: overlap map generated by combining manual and automatic segmentations. Green: true positive, WMH signals simultaneously detected by manual and automatic segmentations; Blue: false negative, WMH signals detected by manual segmentation but undetected by automatic segmentation; Red: false positive, WMH signals detected by automatic segmentation but undetected by manual segmentation. FLAIR, fluid-attenuated inversion recovery; WMH, white matter hyperintensity.

TABLE 1. Similarity Measurement Comparison Between Results of Automatic Segmentation and Manual Segmentation WMH Burden Mild (N = 44) Moderate (N = 26) Severe (N = 12) Total (N = 82)

SI

FPR

FNR

0.78  0.08 0.83  0.06 0.84  0.08 0.80  0.08

0.26  0.06 0.11  0.06 0.08  0.06 0.19  0.07

0.19  0.06 0.21  0.06 0.22  0.07 0.20  0.06

FNR, false-negative rate; FPR, false-positive rate; SI, similarity index; WMH, white matter hyperintensity.

Effect of Different Parameters h Values on Automatic Segmentation of WMHs

Previous comparisons between manual and automatic segmentations were performed using h z 0.035. The effects of h on the performance of manual and automatic segmentations

TABLE 2. Quantitative Analysis of Automated Segmentation with Respect to Manual Segmentation WMH Burden Mild (N = 44) Moderate (N = 26) Severe (N = 12) Total (N = 82)

Automatic (cm3)

Manual (cm3)

R

6.84  2.10 18.83  4.87 37.73  8.51 15.16  11.69

6.5  1.96 20.1  4.93 40.7  7.61 15.84  12.73

0.91 0.92 0.93 0.98

Automatic and manual refer to quantitative white matter hyperintensity (WMH) volume detected by automatic and manual segmentation methods, respectively.

are shown in Figure 6. In all patients with different WMH burdens, SI initially increased and then decreased with increasing h. The greatest SI values for patients with mild, moderate, and severe WMH burdens were 0.81, 0.84, and 0.91, respectively. The number of FP reflected by FPR 7

WANG ET AL

Academic Radiology, Vol -, No -, - 2014

Figure 5. Volumetric comparisons of WMHs detected by manual and automatic segmentations. (a) Linear regression analysis of WMH volumes obtained by manual and automatic segmentations. The blue line corresponds to the regression line, whereas the red line corresponds to the equality line. (b) Bland–Altman analysis of WMH volumes obtained by manual and automatic methods. SD, standard deviation; WMH, white matter hyperintensity.

increased with increasing h, and the FP for patients with mild WMH burden increased faster than that for patients with moderate and severe WMH burdens. The number of FN measured by FNR decreased with increasing h, and the FN for patients with severe and moderate WMH burdens decreased faster than that for patients with mild WMH burden. A common interval of h exists, which guarantees the acceptable accuracy (SI > 0.7), FPR, and FNR for patients with different WMH burdens. The ideal value scopes of h for patients with mild, moderate, and severe WMH burdens were 0.015–0.055, 0.025– 0.085, and 0.035–0.095, respectively, in consideration of the accuracy requirement (SI > 0.7) of WMH segmentation. A relatively low value of h is useful for reducing the FP signal although it is partly responsible for the increase of FNR. Likewise, a relatively high value of h reduces the FPR in the results while it brings more FP signal at the same time. The value of h = 0.035 estimated by an anatomic atlas–based method was more suitable for patients with mild and moderate WMH burdens than for patients with severe WMH burden. The number of FP for all patients was acceptable if h # 0.065. Thus, an intersection scope of h for patients with different WMH burdens could be obtained, namely, h˛[0.035–0.055]. Adjusting h in [0.035–0.055] may be suitable for segmentation tasks on patients with different WMH burdens. In general, automatic segmentation produced better effects on patients with moderate and severe WMH burdens than on patients with mild WMH burden. DISCUSSION This study aimed to develop and assess an automatic segmentation method that would provide accurate detection and quantitative volumetric measurements of WMHs on FLAIR images. The performance of the proposed automatic segmentation method was validated against manual segmentation on patients with different WMH burdens. The proposed method showed favorable accuracy and an acceptable number of FP. TLE-EM segmentation identified differently sized WMHs on FLAIR images. In general, WMHs appear hyperintense compared to the surrounding WM on T2-w, proton 8

density-weighted (PDw), and FLAIR images. Combining information from the different MRI sequences (T1-w, T2w, PDw, and FLAIR) may reduce the uncertainty and increase the accuracy of segmentation. However, the images acquired from the different sequences may sometimes provide diverse information about WMH, resulting in decreased reliability of segmentation results. In addition, nonrigid registration is required to register images from different sequences into the same space. This process is time consuming and labor intensive. The nonrigid registration of differently sized images often integrates with image interpolation, which tends to blur and darken small hyperintensities of some voxels (41). Furthermore, the motion artifacts commonly observed on MR images from elderly patients may affect nonrigid registration and produce useless segmentation results. The FLAIR images used in our proposed method provided enough distinction between WMHs and the surrounding normal tissues. The results of this study proved our method to be highly efficient. The TLE-EM framework for WMH segmentation was performed using a simplified FAST-TLE algorithm to compute for TLE, which was originally designed to eliminate a negative effect of undesired outliers on the MLE process of unknown parameters in GMM. Estimating the proportion of outliers before segmentation improves the efficiency of the FAST-TLE algorithm; therefore, we simplified the algorithm by estimating the average proportion of WMHs and non– brain tissues on FLAIR images using an atlas-based method. The average proportion of abnormally high signals (h = 0.035) was obtained by adding the average proportions of the two parts previously estimated. When h varied in a specific interval, the SI and number of FP for automatic segmentation were not significantly influenced. During segmentation, WMHs, non–brain tissues, and peripheral area with a gray level of 0 (located at the background region outside the brain) were detected in turn as h increased. This phenomenon can be explained by outlier detection theory in combination with the gray-scale distribution of different regions on the FLAIR images. The left tail in the gray histogram corresponded to the peripheral area on the FLAIR images with a gray level of 0; the right tail in the gray histogram indicated areas (WMH,

Academic Radiology, Vol -, No -, - 2014

Figure 6. Similarity measurements change as a function of h with respect to different WMH burden: (a) mild (b) moderate (c) severe. WMH, white matter hyperintensity; FNR, false-negative rate; FPR, false-positive rate.

skull, and scalp) on the FLAIR images with abnormally high signals (Fig 1). WMHs were first detected as outliers to the GMM when h was small because the gray-level statistic of WMHs differed most from that of normal tissue areas (WM and GM). Then, as h gradually increased, the non–brain tissue and peripheral area on the FLAIR images with a gray level of 0 was also detected because of its gray property not conforming to GMM either. The non–brain tissue and peripheral area detected along with WMHs were eliminated through WMH refinement so that the final results would not be contaminated

AUTOMATIC SEGMENTATION OF WMHS ON FLAIR IMAGES

by the unnecessary FP signals produced by the non–brain tissues and peripheral area. Thus, the varying h will not significantly affect the results of automatic segmentation. Three similarity metrics were used to validate the performance of automatic segmentation. The comparison results (SI, FPR, and FNR) indicated an acceptable accuracy and reasonable FP and FN rates. In fact, proper similarity metrics, which satisfy tissue conservation properties, should be chosen for evaluation of the segmentation performance. However, an important issue exists about how to measure the FP rate in the segmentation results. Udupa et al. (38) considered to compute the FPR using FPR = FP/(Ud  M). The FP represents the volume of area falsely classified as WMHs. Ud is a subset of the segmented image so that it contains a reference superset with respect to WMH region. The definition of FPR in this way ensures that the FP is normalized and not >1. However, as described by Udupa et al. (38), the definition of FNR would depend on the size of Ud chosen. In medical imaging, Ud usually corresponds to the foreground region of the image, which is a little too large and will underestimate the FPR. For example, the automatic segmentation in which the LV (A = 10 cm3, and thus FP = 5 cm3) is two times greater than the ground truth (M = 5 cm3) can still result in a pretty small FPR = 0.50% when the foreground size is 1000 cm3. Udupa et al. (38) argued that a standard evaluation framework should be proposed for reasonable segmentation performance evaluation. Thus, we used another way of defining FPR, as illustrated in Equation (4). The manually segmented lesions were considered the reference ground truth, namely the denominator in Equations (4) and (5) for computing the FPR and FNR. The way the FP quantified in the article is widely used by many other authors (35–37) and helps to evaluate the number of FP by computing the relative proportion of the FP to the ground truth (M). WMH burdens must be considered when validating a segmentation method. The number and volume of lesions can vary greatly across different patients, and the segmentation performance can be disparate depending on different WMH burdens. Therefore, we evaluated our method separately with regard to FLAIR images with different WMH burdens. Overall, the manual and automatic segmentation results showed good coherence by comparison, given that the SI values exceeded 0.7. In addition, minimal average differences were detected between the WMH volumes obtained through manual and automatic segmentations (mild, 4.76%; moderate, 6.84%; and severe, 7.59%). Results indicated that compared to manual segmentation, automatic segmentation obtained larger WMH volumes for patients with mild WMH burden but obtained smaller WMH volumes for patients with moderate and severe WMH burdens. In general, automatic segmentation produced better effects on patients with moderate and severe WMH burdens than on patients with mild WMH burden. The major reason for this result is that the segmentation on patients with mild WMH burden is more susceptible to FP voxels than that on other types of patients. To ensure optimum segmentation of differently sized WMHs, reference 9

Academic Radiology, Vol -, No -, - 2014

WANG ET AL

intervals of h with regard to different WMH burdens were provided (mild, 0.015–0.055; moderate, 0.025–0.085; and severe, 0.035–0.095). The primary sources of FP signals produced by automatic segmentation included CSF inflow artifacts and bony artifacts not completely eliminated by the binary brain template in the WMH refinement step. A better brain extraction method, although complicated and hard, would benefit the elimination of FP voxels in non–brain tissues consisting of the skull and scalp. Therefore, basic morphological, dilation, and erosion operations were performed to eliminate the FP voxels of CSF inflow artifacts and bony artifacts, including some isolated voxels. Connected voxels less than three were not detected as WMH candidates in manual segmentation because they were too small and were not usually noticeable to the human eye. In WMH refinement, better and more clean-looking segmentation results for patients with moderate and severe WMH burdens were obtained when dilation and erosion were performed twice to eliminate the FP voxels. For patients with mild WMH burden, dilation and erosion need not be repeated twice because the morphological operation may cause border information loss. Specifically, the number of times that a morphological operation is used depends on the visual effect of abnormally high signals. In our work, an anatomic atlas from a set of 20 patients with different WMH burdens was built to estimate h in the framework of segmentation. The anatomic atlas in our method should be differentiated from the training database used in supervised learning methods, such as support vector machine, k-nearest neighbors, and artificial neural network. The training database used in these supervised learning methods should cover all possible cases acquired from different imaging parameters, protocols, and coils to expand their application range. The data for testing the supervised learning should be different from the training database. However, the anatomic atlas used in our method aims to represent a specific anatomic information, namely, the average proportion (h) of abnormally high signals on FLAIR images. Besides, the data for testing our unsupervised segmentation method were not strictly required as those for the supervised segmentation methods. In fact, the data of all the 82 patients were used to validate the performance of our segmentation. Of note, the anatomic atlas should contain enough cases so that the average proportion (h) of abnormally high signals on FLAIR images could be well estimated. In our experiment, we considered that a quarter of the number of patients with different lesion loads should be enough to estimate the average proportion (h) of abnormally high signals. The process of building the anatomic atlas conformed to the principle of random sampling. The anatomic atlas building from a small number (eg, two patients) of patients could lead to deviation from the correct estimation of h. The anatomic information estimated from the anatomic atlas is independent of the acquisition protocol, thereby ensuring that the h estimated by this anatomic atlas can be used to detect WMHs on FLAIR images 10

TABLE 3. Comparison of Similarity Index for the WMH Segmentation Between Different Methods on Different Real Data Sets WMH Burden

Mild

Moderate

Severe

Total

TLE-EM Anbeek et al. (36) Behloul et al. (41) Khayati et al. (32)

0.78 0.50 0.70 0.73

0.83 0.75 0.75 0.75

0.84 0.85 0.82 0.81

0.80 0.80 0.75 0.75

EM, expectation–maximization; TLE, trimmed-likelihood estimator; WMH, white matter hyperintensity.

acquired under different imaging conditions. Therefore, the repeatability of our method was not affected by the manner the anatomic atlas was built. The segmentation accuracy of our method was comparable with that of other methods (Table 3), which were derived to achieve a similar segmentation task. The methods presented by Anbeek et al. (36), Khayati et al. (32), and Behloul et al. (41) were evaluated on images with different WM lesion loads. The average SI values obtained by our method varied from 0.78 to 0.84 as the WMH burden increased. The SI values in the methods proposed by Anbeek et al. (SI range, 0.5–0.85), Khayati et al (SI range, 0.73–0.81), and Behloul et al (SI range, 0.70–0.82) were lower than those in our method. Leemput et al. (19) also used an outlier-based method as we did in our study to detect multiple sclerosis (MS) lesions using T1-w, T2-w, and PDw images. The average SI value reported in his method was 0.51. An important limitation when comparing between different segmentation methods is the lack of global real medical image data sets and ground truth. An alternative idea is to use synthetic image data sets, such as MS phantoms from BrainWeb. However, direct comparison of methods that use a synthetic image data set is often limited by different levels of intensity inhomogeneity and noise in synthetic images, as well as by validation metrics used in various methods. Thus, the accuracy comparison between our method and the other methods mentioned previously is only for reference. An issue that should be taken into consideration is the interplay between the intensity standardization and inhomogeneity correction in MR images (42). Image intensity standardization (43–46) is designed for correcting acquisition-to-acquisition signals intensity variations inherent in MR images, given that intensities do not have a fixed tissue-specific numeric meaning even within the same MRI protocol, for the same body region, for the images obtained on the same scanner. Madabhushi et al. (42) concluded that inhomogeneity correction followed by intensity standardization achieved the best performance when performing MR image processing. Furthermore, Zhuge et al. (44) demonstrated that inhomogeneity correction followed by intensity standardization obtained much better performance for supervised methods (eg, k-Nearest Neighbor [KNN]) than that for the unsupervised methods (eg, GMM-EM). The main reason is that the MR image

Academic Radiology, Vol -, No -, - 2014

intensities should have a tissue-specific numeric meaning when training a supervised classifier on a set of MR images. Madabhushi et al. (43) pointed out that image intensity standardization needed to cut off the tails of the histograms of the scenes for arriving at a standardization mapping because they often cause problems. The high-intensities tail represents artifacts and outlier intensities which are not because of the lesions. The method (43) has shown how to identify them and not to be influenced by them in choosing landmarks for standardizing transformations. However, the use of the intensity standardization can increase the running time of the segmentation. In fact, our WMH segmentation was carried out by estimating the abnormally high signals on FLAIR images as outliers to the normal-appearing brain tissues, which were modeled using GMM. The TLE-EM segmentation of WMH using outlier detection strategy is unlikely to be influenced by different MRI protocols or scanners, which can make up the accuracy loss for not performing the intensity standardization in our unsupervised segmentation task. The N3 inhomogeneity correction (28) in our method may introduce also other artifacts such as enhancing noise and introducing their own standardness of images intensities. Thus, morphology operation is used as postprocessing to reduce the number of classification errors caused by artifacts mentioned above. In general, the framework of our TLE-EM segmentation is reasonable considering both the accuracy requirement (SI > 0.7) and the segmentation efficiency. In fact, the TLE-EM segmentation makes an advance in WMH detection in comparison with other methods. Firstly, the TLE-EM segmentation is an unsupervised method, which requires no large training database to perform the segmentation. The anatomic atlas in our methods was a representation of some anatomic knowledge (eg, the proportion of abnormally high signal) which is independent of acquisition protocol. Thus, our TLE-EM segmentation applies to different sources of MR images for WMH detection and quantification. In comparison, the application of supervised methods is limited on the MR images which should have similar features to the images for training the classifiers. For another, the TLE-EM segmentation using outlier detection theory avoids the need to model the intensity of heterogeneous lesions. It is hard to build a universal model for WMHs because variability in lesion location, size, and shape. Specific model may be effective for detection of special sublesions; however, similar problems also will occur for the model-based methods like those for the supervised segmentation. On balance, our TLE-EM segmentation is a good choice in view of the accuracy and the range of its application. A limitation of this study is that the selection of h in our method requires manual intervention. To settle this problem, a reference range of parameter (h˛[0.035–0.055]) was developed in this study. According to their understanding of specific medical images, radiologists can adjust the parameter when detecting and quantifying differently sized WMHs. The procedure where h is manually selected should be replaced to fully automate the process. In the future, we are considering

AUTOMATIC SEGMENTATION OF WMHS ON FLAIR IMAGES

automatic selection of h according to the shape of a grayscale histogram. Moreover, tissue classes of WM, GM, and CSF were not distinguished in this study because the FLAIR sequence provided only limited information. Different MRI sequences should be combined to extensively study the interplay of WMHs and surrounding normal tissues, including WM, GM, and CSF.

CONCLUSIONS In this study, we developed an automatic segmentation method to detect and quantify differently sized WMHs on FLAIR images. The performance of automatic segmentation on clinical image data from 82 patients with different WMH burdens was satisfactory. The segmentation of WMHs is a modelbased framework that considers WMHs as outliers to normal-appearing brain tissues. The application of this method is independent of imaging parameters and devices, which expand the applied range of our method. For practical applications, this method may be useful as a computer-aided diagnostic tool to study WMHs in aging and dementia. We have released the algorithm as user-friendly software and are considering making it accessible to clinical researchers in the future.

ACKNOWLEDGMENTS We thank the radiologists of Shanghai Sixth People’s Hospital for making their clinical images and ground truth data.

REFERENCES 1. de Groot JC, Oudkerk M, Gijn Jv, et al. Cerebral white matter lesions and cognitive function: the Rotterdam Scan Study. Ann Neurol 2000; 47: 145–151. 2. Debette S, Markus H. The clinical importance of white matter hyperintensities on brain magnetic resonance imaging: systematic review and metaanalysis. BMJ 2010; 341:c3666. 3. Silbert L, Nelson C, Howieson D, et al. Impact of white matter hyperintensity volume progression on rate of cognitive and motor decline. Neurology 2008; 71:108–113. 4. Barkhof F, Scheltens P. Imaging of white matter lesions. Cerebrovas Dis 2002; 13:21–30. 5. Gerdes VE, Kwa VI, ten Cate H, et al. Cerebral white matter lesions predict both ischemic strokes and myocardial infarctions in patients with established atherosclerotic disease. Atherosclerosis 2006; 186:166–172. 6. Naka H, Nomura E, Takahashi T, et al. Combinations of the presence or absence of cerebral microbleeds and advanced white matter hyperintensity as predictors of subsequent stroke types. Am J Neuroradiol 2006; 27: 830–835.  A, Elbaz A, Zhu Y, et al. White matter lesions volume and motor 7. Soumare performances in the elderly. Ann Neurol 2009; 65:706–715. 8. Scheltens P, Erkinjunti T, Leys D, et al. White matter changes on CT and MRI: an overview of visual rating scales. Eur Neuro 1998; 39:80–89. 9. Fazekas F, Barkhof F, Wahlund L, et al. CT and MRI rating of white matter lesions. Cerebrovas Dis 2002; 13:31–36. 10. Yamamoto D, Arimura H, Kakeda S, et al. Computer-aided detection of multiple sclerosis lesions in brain magnetic resonance images: false positive reduction scheme consisted of rule-based, level set method, and support vector machine. Comput Med Imaging Graph 2010; 34:404–413. 11. Hulsey KM, Gupta M, King KS, et al. Automated quantification of white matter disease extent at 3T: Comparison with volumetric readings. J Magn Reson Imaging 2012; 36:305–311.

11

WANG ET AL

12. Archip N, Jolesz FA, Warfield SK. A validation framework for brain tumor segmentation. Acad Radiol 2007; 14:1242–1251. 13. Clas P, Groeschel S, Wilke M. A semi–automatic algorithm for determining the demyelination load in metachromatic leukodystrophy. Acad Radiol 2012; 19:26–34. 14. Lao Z, Shen D, Liu D, et al. Computer-assisted segmentation of white matter lesions in 3D MR images using support vector machine. Acad Radiol 2008; 15:300–313. 15. Garcıa–Lorenzo D, Francis S, Narayanan S, et al. Review of automatic segmentation methods of multiple sclerosis white matter lesions on conventional magnetic resonance imaging. Med Image Anal 2013; 17:1–18. 16. Liu JG, Udupa JK, Odhner D, et al. A system for brain tumor volume estimation via MR imaging and fuzzy connectedness. Comput Med Imaging Graph 2005; 29:21–34. 17. Udupa JK, Wei L, Samarasekera S, et al. Multiple sclerosis lesion quantification using fuzzy-connectedness principles. IEEE T Med Imaging 1997; 16:598–609. 18. Udupa JK, Saha PK, Lotufo RA. Relative fuzzy connectedness and object definition: theory, algorithms, and applications in image segmentation. IEEE T Pattern Anal 2002; 24:1485–1500. 19. Van Leemput K, Maes F, Vandermeulen D, et al. Automated segmentation of multiple sclerosis lesions by model outlier detection. IEEE T Med Imaging 2001; 20:677–688. 20. Zhang Y, Brady M, Smith S. Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE T Med Imaging 2001; 20:45–57. 21. Wells WM, III, Grimson WEL, Kikinis R, et al. Adaptive segmentation of MRI data. IEEE T Med Imaging 1996; 15:429–442. 22. Neykov N, Filzmoser P, Dimova R, et al. Robust fitting of mixtures using the trimmed likelihood estimator. Comput Stat Data Anal 2007; 52:299–308. 23. Bricq S, Collet C, Armspach J–P. Lesions detection on 3D brain MRI using trimmed likelihood estimator and probabilistic atlas. In: Proceedings of the 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro. ISBI 2008: Paris, France, 2008;93–96.   Likar B, et al. Automated segmentation of MS 24. Galimzianova A, Spiclin Z, lesions in brain MR images using localized trimmed-likelihood estimation. In: Proc. SPIE 8669, Medical Imaging: Image Processing. Lake Buena Vista (Orlando Area), Florida, USA: 2013;86693E–86693E–86697. 25. Garcıa–Lorenzo D, Prima S, Arnold DL, et al. Trimmed-likelihood estimation for focal lesions and tissue segmentation in multisequence MRI for multiple sclerosis. IEEE T Med Imaging 2011; 30:1455–1467. 26. Varela F, Lachaux J–P, Rodriguez E, et al. The BrainWeb: phase synchronization and large-scale integration. Nat Rev Neurosci 2001; 2:229–239. 27. Gibson E, Gao F, Black SE, et al. Automatic segmentation of white matter hyperintensities in the elderly using Flair images at 3T. J Magn Reson Imaging 2010; 31:1311–1322.

12

Academic Radiology, Vol -, No -, - 2014

28. Sled JG, Zijdenbos AP, Evans AC. A nonparametric method for automatic correction of intensity nonuniformity in MRI data. IEEE T Med Imaging 1998; 17:87–97. 29. Smith SM. Fast robust automated brain extraction. Hum Brain Mapp 2002; 17:143–155. 30. Otsu N. A threshold selection method from gray-level histograms. Automatica 1975; 11:23–27. 31. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc 1977; 39:1–38. 32. Khayati R, Vafadust M, Towhidkhah F, et al. Fully automatic segmentation of multiple sclerosis lesions in brain MR FLAIR images using adaptive mixtures method and Markov random field model. Comput Biol Med 2008; 38: 379–390. 33. Dice LR. Measures of the amount of ecologic association between species. Ecology 1945; 26:297–302. 34. Zijdenbos AP, Dawant BM, Margolin RA, et al. Morphometric analysis of white matter lesions in MR images: method and validation. IEEE T Med Imaging 1994; 13:716–724. 35. Anbeek P, Vincken KL, van Osch MJ, et al. Automatic segmentation of different-sized white matter lesions by voxel probability estimation. Med Image Anal 2004; 8:205–215. 36. Anbeek P, Vincken KL, van Osch MJ, et al. Probabilistic segmentation of white matter lesions in MR imaging. Neuroimage 2004; 21:1037–1044. 37. de Boer R, Vrooman HA, van der Lijn F, et al. White matter lesion extension to automatic brain tissue segmentation on MRI. Neuroimage 2009; 45: 1151–1161. 38. Udupa JK, LeBlanc VR, Ying ZG, et al. A framework for evaluating image segmentation algorithms. Comput Med Imaging Grap 2006; 30:75–87. 39. Altman DG, Bland JM. Measurement in medicine: the analysis of method comparison studies. Stat 1983; 32:307–317. 40. Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res 1999; 8:135–160. 41. Admiraal–Behloul F, Van Den Heuvel D, Olofsen H, et al. Fully automatic segmentation of white matter hyperintensities in MR images of the elderly. Neuroimage 2005; 28:607–617. 42. Madabhushi A, Udupa JK. Interplay between intensity standardization and inhomogeneity correction in MR image processing. IEEE T Med Imaging 2005; 24:561–576. 43. Madabhushi A, Udupa JK. New methods of MR image intensity standardization via generalized scale. Med Phys 2006; 33:3426–3434. 44. Zhuge Y, Udupa JK. Intensity standardization simplifies brain MR image segmentation. Comput Vis Image Und 2009; 113:1095–1103. 45. Nyul LG, Udupa JK. On standardizing the MR image intensity scale. Magn Reson Med 1999; 42:1072–1081. 46. Nyul LG, Udupa JK, Zhang X. New variants of a method of MRI scale standardization. IEEE T Med Imaging 2000; 19:143–150.