NeuroImage 60 (2012) 2379–2388
Contents lists available at SciVerse ScienceDirect
NeuroImage journal homepage: www.elsevier.com/locate/ynimg
Multi-stage segmentation of white matter hyperintensity, cortical and lacunar infarcts Yanbo Wang a, Joseree Ann Catindig b, Saima Hilal a, Hock Wei Soon c, Eric Ting d, Tien Yin Wong e, f, Narayanaswamy Venketasubramanian b, Christopher Chen a, Anqi Qiu c, g, h,⁎ a
Department of Pharmacology, National University of Singapore, Singapore Division of Neurology, University Medicine Cluster, National University Health System, Singapore Department of Bioengineering, National University of Singapore, Singapore d Department of Diagnostic Imaging, National University Health System, Singapore e Singapore Eye Research Institute, National University of Singapore, Singapore f Department of Ophthalmology, National University of Singapore, Singapore g Singapore Institute for Clinical Sciences, The Agency for Science, Technology and Research, Singapore h Clinical Imaging Research Center, National University of Singapore, Singapore b c
a r t i c l e
i n f o
Article history: Accepted 15 February 2012 Available online 22 February 2012 Keywords: White matter hyperintensity Cortical infarct Lacunar infarct Magnetic resonance imaging
a b s t r a c t Cerebral abnormalities such as white matter hyperintensity (WMH), cortical infarct (CI), and lacunar infarct (LI) are of clinical importance and frequently present in patients with stroke and dementia. Up to date, there are limited algorithms available to automatically delineate these cerebral abnormalities partially due to their complex appearance in MR images. In this paper, we describe an automated multi-stage segmentation approach for labeling the WMH, CI, and LI using multi-modal MR images. We first automatically segment brain tissues (white matter, gray matter, and CSF) based on the T1-weighted image and then identify hyperintense voxels based on the fluid attenuated inversion recovery (FLAIR) image. We finally label the WMH, CI, and LI based on the T1-weighted, T2-weighted, and FLAIR images. The segmentation accuracy is evaluated using a community-based sample of 272 old adults. Our results show that the automated segmentation of the WMH, CI, and LI is comparable with manual labeling in terms of spatial location, volume, and the number of lacunes. Additionally, the WMH volume is highly correlated with the visual grading score based on the Age-Related White Matter Changes (ARWMC) protocol. The evaluations against the manual labeling and ARWMC visual grading suggest that our algorithm provides reasonable segmentation accuracy for the WMH, CI, and LI. © 2012 Elsevier Inc. All rights reserved.
Introduction Magnetic resonance imaging (MRI) has been widely used to detect a variety of cerebral abnormalities, such as white matter hyperintensity (WMH), cortical infarct (CI), and lacunar infarct (LI), that are of clinical importance. The WMH is thought to reflect small vessel cerebrovascular disease (Pantoni, 2002; Young et al., 2008), and may contribute to age-associated cognitive decline (Carmichael et al., 2010; He et al., 2010; Jokinen et al., 2009; Marquine et al., 2010; Ota et al., 2009; Vannorsdall et al., 2009), and increase the risk of dementia (Debette and Markus, 2010). CIs, as the name suggests, are infarcts in the cortical regions caused by cerebral artery occlusion most commonly by emboli, whereas LIs are subcortical infarcts due to the blockages of small penetrating arteries in the deep brain region ⁎ Corresponding author at: Department of Bioengineering, National University of Singapore, 9 Engineering Drive 1, Block EA 03-12, Singapore 117574, Singapore. Tel.: +65 6516 7002; fax: + 65 6872 3069. E-mail address:
[email protected] (A. Qiu). 1053-8119/$ – see front matter © 2012 Elsevier Inc. All rights reserved. doi:10.1016/j.neuroimage.2012.02.034
with sizes up to approximately 15 mm (Ropper, 2005; Xavier et al., 2003). Previous studies suggest that both types of cerebral infarcts are associated with cognitive decline and increase the likelihood of dementia and stroke (Bennett et al., 2005; Carey et al., 2008; Jokinen et al., 2011; Jokinen et al., 2009; Schneider et al., 2004; Tripathi et al., 2011). Hence, it is of clinical importance to identify these cerebral abnormalities for potential early prevention, diagnosis, and treatment in cerebrovascular and neurodegenerative diseases. Due to the complex mechanisms underlying cerebrovascular diseases, the appearance of the WMH, CI, and LI is heterogeneous in terms of their location, size, and image intensity on MRI. Up to now, visual inspection is still a common approach used to quantify the severity of these cerebral abnormalities. However, it is laborious and time consuming and therefore impractical in large-scale imaging studies. In addition, visual inspection is biased to raters and hence, highly dependent on inter and intra rater reliabilities (Kapeller et al., 2003; Prins et al., 2004; Vannorsdall et al., 2009), which in turn decrease the sensitivity in subsequent statistical analyses (Garrett et al., 2004; Van Straaten et al., 2006).
2380
Y. Wang et al. / NeuroImage 60 (2012) 2379–2388
In recent years, major progress has been made on the development of semi- or fully automated segmentation for the WMH. Wen and Sachdev (2004) and Ramirez et al. (2011) introduced semiautomated approaches by choosing empirical thresholds based on the descriptive statistics of the image intensity and then manually modifying false WMH areas. Jack et al. (2001) and Gibson et al. (2010) developed fully automated approaches that first employ empirical thresholds before applying linear fitting or fuzzy clustering to segment the WMH. However, both approaches are based only on fluid attenuated inversion recovery (FLAIR) images that are less sensitive in the posterior fossa. In addition, they may overestimate the WMH due to its typical high intensity appearance in cortical areas, the septum, pellucidum, and flow artifacts in the 4th ventricle where a large percentage of the false positive WMH is detected. To partially address these issues, Gibson et al. (2010) further applied a white matter mask to remove this false positive WMH. More advanced methods have been developed based on Markov random field model (Schwarz et al., 2009), k-nearest neighbor (Anbeek et al., 2004; Wen et al., 2009), and neural classification (Dyrby et al., 2008), which require training images with the WMH labels. The segmentation accuracy of these methods relies on the representative training data that may be difficult to select due to the heterogeneous nature of the WMH. Different from these methods with a need of training samples, Admiraal-Behloul et al. (2005) proposed a fuzzy inference system to classify the WMH based on both anatomical locations and intensity values from the T2-weighted MRI and FLAIR. This approach is robust to a wide range of image intensities and contrasts. However, as it uses the prior masks of the intra-cranial, white matter, gray matter, and cerebrospinal fluid (CSF) in the Montreal Neurological Institute (MNI) brain template, segmentation accuracy is highly dependent on the alignment of individual subjects to the MNI template. Similar approaches using the above mentioned machine learning techniques have been proposed for the automated segmentation of multiple sclerosis lesions as well (Shiee et al., 2010; Warfield et al., 2000; Wu et al., 2006; Zijdenbos et al., 2002). The CI and LI have thus far been manually identified by neuroradiologists in most of the existing studies (Bennett et al., 2005; Carey et al., 2008; Jokinen et al., 2011, 2009; Schneider et al., 2004; Tripathi et al., 2011). A few fully automated LI segmentation approaches have been proposed using the T1- and T2-weighted images (Uchiyama et al., 2007; Yokoyama et al., 2007). Uchiyama et al. (2007) first applied the top hat transform and then binarized the T2-weighted MR image for labeling LI voxels. Next, support vector machine classification was used to eliminate the LI false positives. Yokoyama et al. (2007) searched for LI candidates using a binarization approach at multiple threshold levels and then removed false positive LIs based on intensity thresholds and the shape of the LI. Since the size of the LI is relatively small and its appearance in the T1 and T2-weighted MRI is similar to CSF and the WMH, the method is limited to identifying LIs using only T1- and T2-weighted MR images. Sasaki et al. (2008) demonstrated that combining the FLAIR with the T1- and T2weighted MR images increases the segmentation accuracy of the LI. In this paper, we employ multi-modal MR images and present a multi-stage segmentation approach to automatically delineate the WMH, CI, and LI. Since the T1-, T2-weighted, and FLAIR images are commonly used in hospitals for determining cerebral abnormalities and have been recommended by previous studies (Debette and Markus, 2010; Jokinen et al., 2011, 2009; Kapeller et al., 2003; Sasaki et al., 2008; Van Straaten et al., 2006), our approach takes these three MRI modalities in order to increase the sensitivity and specificity of the abnormal white matter classification. Moreover, we developed our segmentation algorithm based on a series of simple image analysis operations, including Gaussian mixture models, region growing, and morphological operations, without the need of a training set. Hence, our method overcomes the difficulties in the selection of a representative training set, customarily faced by other existing
approaches (Anbeek et al., 2004; Dyrby et al., 2008; Schwarz et al., 2009; Wen et al., 2009). Furthermore, our framework automatically labels the WMH, CI, and LI at the same time whereas existing methods often segment the WMH (Admiraal-Behloul et al., 2005; Anbeek et al., 2004; Dyrby et al., 2008; Gibson et al., 2010; Jack et al., 2001; Schwarz et al., 2009; Wen and Sachdev, 2004; Wen et al., 2009) or the LI alone (Uchiyama et al., 2007; Yokoyama et al., 2007). Finally, we evaluate the segmentation accuracy of the WMH, LI, and CI through comparison with manual labels and visual grading using a dataset of 272 old adults. Methods We now present a multi-stage segmentation technique (Fig. 1) based on T1-, T2-weighted, and fluid attenuated inversion recovery (FLAIR) magnetic resonance (MR) images. After correcting for intensity inhomogeneity, removing the brain skull, and aligning withinsubject T2-weighted and FLAIR images to the corresponding T1weighted image, the brain tissues (white matter, gray matter, CSF) and hyperintense regions are respectively identified using the T1weighted MRI and FLAIR. We then further classify the hyperintense regions into the WMH and CI and identify the LI using T1- and T2weighted, and FLAIR images. We will detail each stage in the following: 1. Preprocessing: Within individual subjects, the T1-, T2-weighted and FLAIR MR images are first corrected for intensity inhomogeneity using the non-parametric non-uniform intensity normalization (Sied et al., 1998). Both FLAIR and T2-weighted MR images are subsequently aligned to the within-subject corresponding T1weighted image based on affine transformation obtained from FMRIB's Linear Image Registration Tool with cross correlation as a cost function (FLIRT, Jenkinson et al., 2002). Since the contrast between brain and non-brain tissues is better in FLAIR, the brain skull is removed in the FLAIR image using the Brain Extraction Tool (BET, Smith, 2002) before applying the brain mask to the T1- and T2-weighted images. 2. (Stage I) Brain Tissue Segmentation: Whole brain segmentation is performed with the Freesurfer image analysis suite (Fischl et al., 2002), which is freely available online (http://surfer.nmr.mgh. harvard.edu). The brain tissues, including the white matter, gray matter, CSF, subcortical and ventricular structures (e.g., the basal ganglia and the thalamus) as well as hypointense regions, are automatically delineated from the intensity-inhomogeneity corrected T1-weighted MR image using a Markov random field model. The image masks are respectively created for the whole brain and the combination of the white matter, hypointense regions, subcortical and ventricular structures that are used in the later stages described below. 3. (Stage II) Segmentation of Hyperintense Regions using the FLAIR and T1-weighted images: The moderate hyperintense signal shown in the FLAIR may be due to cortical infarction or increased tissue water content or degradation of the macromolecular structure of the myelin (Admiraal-Behloul et al., 2005). In this stage, we identify voxels with hyperintensity in the FLAIR across the whole brain. To do so, we first model the FLAIR intensity distribution of the whole brain using a Gaussian mixture model (GMM) with three tissue classes, including the CSF, white matter, and hyperintense tissue (Joshi et al., 1999). The Expectation– Maximization (EM) algorithm (Dempster et al., 1977; Joshi et al., 1999) is employed to estimate the intensity mean and standard deviation for each tissue class. These parameters are initialized based on the FLAIR intensity statistics of the CSF, white matter, and hypointense tissue given by the above-mentioned T1-weighted image segmentation. We maximize a posterior distribution of the tissue labels given the FLAIR intensity and obtain
Y. Wang et al. / NeuroImage 60 (2012) 2379–2388
2381
Fig. 1. Schematic of a multi-stage segmentation approach. Abbreviations: FLAIR, fluid attenuated inversion recovery image; WMH, white matter hyperintensity; CI, cortical infarct; LI, lacunar infarct.
a threshold, λ1, that discriminates the white matter and hyperintense tissue. Since the intensity level of the gray matter in FLAIR lies in that between WM and WMH, we constrain the initial hyperintense region in the white matter, basal ganglia, and thalamus regions. Nevertheless, this initial hyperintense region often contains substantial false positives at the boundary between the white and gray matters due to partial volume effects. These artifacts are often small. We thus eliminate them by first applying a three-dimensional erosion operation using a disk with a radius of 1 mm to remove small hyperintense clusters and then a region growing operation to recover the boundary of the large hyperintense clusters that are either previously removed from the erosion operation or those hyperintense voxels located in the gray matter region. Since the intensity levels at the cortical and deep gray matter regions are various, we refine our segmentation by applying the second GMM only in the white matter and subcortical regions identified by the above-mentioned T1-weighted image segmentation. Different from the first GMM, this GMM segments the white matter and subcortical regions into two tissue classes — normal intensity and hyperintense tissues. The parameters estimated from the first GMM are taken as the initialization of the current GMM. Since this GMM only takes into account the white matter and subcortical regions, the proportion of the hyperintense regions increases when compared to that in the first GMM that is applied in the whole brain. Thus, the estimation of the hyperintense regions is improved in this second GMM. Finally, we further remove false positive hyperintense voxels that are flow artifacts around the fourth ventricle by composing the predefined mask given in the Montreal Neurological Institute (MNI) brain template space. To further reduce the false positive rate, hyperintense clusters with size less than 10 voxels are removed as well. 4. (Stage III) Separation of the WMH and CI: Even though the WMH and CI both appear hyperintense in the FLAIR image, their anatomical locations differ. The WMH often appears in white matter and subcortical structures, while the CI starts at the cortex, spreads
to the white matter, and eventually merges as large confluent lesions. Based on their anatomical locations, we separate the hyperintense regions into the WMH and CI. To do so, a brain outer rim with thickness of 5 mm is constructed by first eroding the whole brain mask by 5 mm and subtracting this eroded region from the whole brain mask. This outer rim contains both the white matter and gray matter 5 mm away from the skull. We consider the intersection between this brain outer rim and the previously defined hyperintense region as seed regions of the cortical infarcts and employ a region growing operation to recover cortical infarction regions before they reach the periventricular border. The periventricular border is considered as a 5 mm dilation of the lateral ventricles. Rest of the hyperintense regions are classified as WMH. 5. (Stage IV) Segmentation of LI: LIs are small noncortical infarcts (2 mm–15 mm in diameter) caused by occlusion in penetrating branches of major cerebral arteries. They are often found in the white matter and subcortical structures surrounded by hyperintensity rims, and appear as hyperintense dots on the T2weighted image and hypointense dots on the T1-weighted and FLAIR images (Sasaki et al., 2008). In our study, we segment LIs based on the T1-, T2-weighted, and FLAIR images as well as the masks of the WMH and lateral ventricles. Firstly, we identify LIs near the WMH region and then in the subcortical structures. For LIs near the WMH region, we extend the WMH region using the dilation operation to determine the regions where LIs may occur. In this dilated region, LI voxels are identified based on the criteria: 1. intensity in the FLAIR image is less than the average intensity value of the white matter (given in Stage II); 2. intensity in the T2-weighted MRI is higher than the average intensity within the WMH region; 3. intensity in the T1-weighted MRI is lower than average intensity within the WMH region. For LIs in the subcortical structures, LIs' intensity criteria in the T1and T2-weighted MR images remain the same. However, the intensity levels of individual subcortical structures are different, we empirically determine the intensity of LIs in the FLAIR image
2382
Y. Wang et al. / NeuroImage 60 (2012) 2379–2388
Y. Wang et al. / NeuroImage 60 (2012) 2379–2388
as one standard deviation lower than the mean intensity of that subcortical structure. Furthermore, since the size of the LI is small, the accuracy of its detection is sensitive to potential misalignment across T1-, T2-weighted, and FLAIR images. To overcome this issue, a voxel is labeled as LI if most of its neighbors' FLAIR and T2 intensities satisfy the above conditions, given that the voxel satisfies the above criterion of the T1-weighted image. We note that perivascular (Virchow–Robin) spaces are filled with CSF and resemble the LIs in the T1-, T2-weighted, and FLAIR images. However, the perivascular spaces are typically 1 mm or less in diameter (Awad et al., 1986) and are not included in our segmentation. Results To evaluate the segmentation accuracy, we selected 272 subjects (131 males and 141 females; age: 70.7 ± 6.3 years; age range: 60 to 86 years) from the ongoing epidemiological aging cohort recruited by the Memory Aging & Cognition Center at the National University of Singapore. Every subject underwent MRI scans that were performed on a 3T Siemens Magnetom Trio Tim scanner using a 32-channel head coil at the Clinical Imaging Research Center of the National University of Singapore. The image protocols are: (i) high-resolution T1-weighted Magnetization Prepared Rapid Gradient Recalled Echo (MPRAGE; 192 slices, 1 mm thickness, in-plane resolution 1 mm, no inter-slice gap, sagittal acquisition, field of view= 256 × 256 mm, matrix = 256 × 256, repetition time = 2300 ms, echo time= 1.9 ms, inversion time= 900 ms, flip angle = 9°); (ii) T2-weighted imaging protocol (axial spin echo sequence, 48 slices with 3 mm slice thickness, no inter-slice gap, matrix = 232 × 256, field of view = 232 × 256 mm, repetition time = 2600 ms, echo time = 99 ms, flip angle = 120°); fluid attenuated inversion recovery (FLAIR) imaging protocol (repetition time = 9000 ms; echo time = 82 ms; inversion time = 2500 ms; matrix size = 232 × 256, field of view = 232 × 256 mm, slice thickness = 3 mm, no slice gap, number of slices = 48). The acquisition times of the T1-, T2-weighted, and FLAIR MR images are 5 min 20 s, 1 min 23 s, and 3 min 36 s respectively. The WMH and CI of 19 subjects were manually delineated by two trained clinicians (JAC and HS) under the guidance from a neuroradiologist (ET), whereas the manual segmentation of the LI was performed by a trained clinician (HS) on the entire dataset. All the manual labeling was performed using ITK-SNAP (Yushkevich et al., 2006) and carried out on the FLAIR image with the help of the visual inspection on the corresponding T1- and T2-weighted images. We first validate the automated segmentation accuracy of the WMH and CI against the manual labeling in 19 subjects using Dice similarity index, volume correlation and Bland–Altman plot. The accuracy of the LI is evaluated against the manual labeling over 272 subjects. In addition, we present the correlation of the WMH volume assessed using our automated segmentation with a visual grading score, the Age-Related White Matter Changes (ARWMC) score (Wahlund et al., 2001), based on the entire dataset of 272 subjects. Comparison between manual and automated segmentation In order to validate the automated segmentation procedure, we compare the automated results with the manual labels on the same datasets. The test–retest reliability of the manual segmentation as well as the accuracy of the automated segmentation against the manual segmentation are assessed using Dice similarity index
2383
Table 1 Segmentation accuracy of the white matter hyperintensity (WMH). The second and third columns respectively list the Dice similarity index (DSI) of the test–retest reliability between two manual labels and between the manual and automated segmentations. The fourth and fifth columns respectively list the WMH volumes from the manual and automated segmentations. The last column indicates the category of the WMH load for individual subjects: mild WMH load with the volume less than 10 ml, moderate WMH load with the volume between 10 ml and 30 ml, and severe WMH load with the volume larger than 30 ml. Subject
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
DSI
Volume (ml)
Test–retest
Auto–manual
Manual
Auto
0.74 0.66 0.73 0.77 0.69 0.65 0.79 0.73 0.84 0.75 0.84 0.84 0.78 0.83 0.80 0.78 0.85 0.84 0.85
0.70 0.66 0.69 0.69 0.70 0.69 0.74 0.76 0.79 0.79 0.85 0.83 0.82 0.86 0.80 0.81 0.83 0.86 0.80
2.51 3.28 3.87 5.10 6.63 7.84 8.12 10.26 10.57 13.40 17.21 19.16 24.19 30.38 33.57 34.59 46.61 47.36 63.60
2.04 3.26 4.72 3.12 5.72 7.39 8.03 8.55 10.90 11.29 18.32 16.41 24.33 32.61 32.43 35.66 44.07 40.94 57.58
WMH load Mild Mild Mild Mild Mild Mild Mild Moderate Moderate Moderate Moderate Moderate Moderate Severe Severe Severe Severe Severe Severe
(DSI) (Dice, 1945; Stokking et al., 2000; Zijdenbos et al., 1994). DSI is defined as DSIðL1 ; L2 Þ ¼
V ðL1 ∩L2 Þ ; ðV ðL1 Þ þ V ðL2 ÞÞ=2
ð1Þ
where L1, L2 denote two labeled images and V(L) is a function that takes a label image and returns its volume. Fig. 2 illustrates the automated and manual WMH segmentation results of six subjects. According to the WMH volume from the manual delineation, the WMH load can be categorized into three classes mild WMH load with the volume less than 10 ml, moderate WMH load with the volume between 10 ml and 30 ml, and severe WMH load with the volume larger than 30 ml. Based on the WMH load, the 19 subjects can be classified as follows: 7 cases with mild WMH load, 6 cases with moderate WMH load, 6 cases with severe WMH load. Table 1 shows the detailed test–retest reliability of the 19 subjects for the manually labeled WMH and the automated WMH segmentation accuracy against the manual labeling (auto–manual), the WMH volumes for the manual and automated segmentation. On average, the test–retest reliability and auto–manual DSI increases as the WMH load increases (Table 2: Part I). For example, the mean DSI between the automated and manual segmentations is 0.70 for the mild WMH load, while reaching up to 0.83 for the severe WMH load. The test–retest DSI of the manual labels observes a similar trend. DSI between the automated and manual segmentations is statistically equivalent with DSI of the test–retest reliability between two raters in all three WMH loads (mild: p = 0.192, t6 = −1.471; moderate: p = 0.518, t5 = 0.696; severe: p = 0.816, t5 = 0.245). In addition, the WMH volumes from the manual and automated segmentations are statistically comparable at all three levels of the WMH load (mild: p = 0.231, t6 = − 1.332; moderate: p = 0.248, t5 = −1.307; severe: p = 0.203, t5 = −1.464), which can also be
Fig. 2. Examples of the white matter hyperintensity (WMH). Each row shows one example of the WMH segmentation from one subject. The first two rows show two examples with mild WMH load, while the third and fourth rows illustrate two examples with moderate WMH load and the last two rows show two examples with severe WMH load. The columns from left to right respectively show FLAIR axial slices without and with the automated segmentation and manual segmentation labels.
2384
Y. Wang et al. / NeuroImage 60 (2012) 2379–2388
Table 2 Part I lists the mean and standard deviation (SD) values of Dice similarity index (DSI) for the test–retest reliability of two manual labels and the automated segmentation accuracy against the manual labeling (auto–manual) in different groups of the WMH load. Part II lists the mean and SD values of DSIs for the WMH reported by existing literature. Part I WMH load
Mild
Moderate
Severe
All levels
Test–retest DSI Auto–manual DSI
0.72 (0.05) 0.70 (0.02)
0.78 (0.06) 0.80 (0.04)
0.82 (0.03) 0.83 (0.03)
0.77 (0.06) 0.77 (0.06)
0.70 (0.09)
0.75 (0.07)
0.82 (0.05)
0.75 (0.09)
0.45 (0.15) 0.5
0.62 (0.11) 0.75
0.65 (0.15) 0.85
0.56 0.8
Part II Admirral-Behloul et al. (2005) Dyrby et al. (2008) Anbeek et al. (2004)
with LIs, yielding a sensitivity of 83.3% (30/36). Of the 236 subjects without LIs, our approach can correctly identify 228 of them, achieving a high specificity of 96.6% (228/236). Among all 272 subjects, 36 subjects have 62 lacunar infarcts identified through manual labeling. The automated segmentation correctly identifies 50 out of 62 LIs and achieves a sensitivity of 80.6% (50/62). Nevertheless, the automated segmentation labels additional 18 LIs that are not identified through the manual segmentation and thus gives a false positive per subject at 0.06 (18/272). Fig. 7 shows a histogram of the differences between manual labeling and automated detection of LI, with indications of good agreement. Correlation of the WMH Volume with the ARWMC score
supported by the high correlations of the volumes between the manual and automated segmentations (mild: r = 0.925, p = 0.003, moderate: r = 0.965, p = 0.002, severe: r = 0.982, p = 0.001). The Bland–Altman plots (Fig. 3) provide further visual evidence, suggesting consistency between the manual and automated segmentations at all three levels of the WMH load. Compared with existing WMH segmentation methods (Admiraal-Behloul et al., 2005; Anbeek et al., 2004; Dyrby et al., 2008), our approach is able to deliver competitive DSIs at all three levels of the WMH load (Table 2, Part II). Beyond the WMH segmentation, researchers (Shiee et al., 2010; Wu et al., 2006; Zijdenbos et al., 2002) also automatically segmented white matter lesions, such as multiple sclerosis with similar intensity patterns, as WMH, and achieved segmentation accuracy with DSI at values between 0.6 and 0.7, which is slightly lower than ours. Fig. 4 illustrates three examples of the CI segmentation. Among the 19 subjects selected for manual labeling, only 6 subjects had CIs. Hence, manual segmentation of the CIs was performed only in these 6 subjects. The DSI between the automated and manual segmentations is 79% in average (Table 3), and statistically equivalent to the test–retest reliability of two manual labels (p = 0.225, t5 = −1.383). The CI volume obtained using the automated segmentation is also statistically equivalent to that using the manual labeling (p = 0.182, t5 = −1.549). The CI volumes from the automated and manual segmentations are significantly correlated with each other (r = 0.997, p b 0.001). The discrepancies between the automated and manual segmentations mainly occur at the region where the WMH and CI are connected since there is no clear boundary between them. The Bland–Altman plot in Fig. 5 further illustrates the agreement of the volumes between the automated and manual segmentations. Fig. 6 illustrates three LI examples. Among all 272 subjects, the manual segmentation identifies 36 subjects with LIs. The automated segmentation is only able to identify 30 out of these 36 subjects
The WMH segmentation was further evaluated by examining the correlation of the WMH volume derived from the automated segmentation with a visual grading scale obtained using the Age-Related White Matter Changes (ARWMC) protocol (Wahlund et al., 2001) in all 272 subjects. The ARWMC has been widely used to access the WMH load within the frontal, parieto-occipital, temporal, infratentorial, and basal ganglia areas with a maximum scale of 30. In our study, the ARWMC scale was graded by two trained clinicians (JAC and HS) with intra-class correlation 0.957 (p b 0.001) based on the first 139 subjects. Due to the high inter-rater reliability, the ARWMC for the subsequent scans was only assessed by one rater (HS). Fig. 8 shows the scatter plot between the ARWMC and WMH volume, where a strong linear association can be observed (Pearson correlation = 0.728, p b 0.001). Discussion We present a multi-stage automated segmentation framework for delineating the white matter hyperintensity, cortical and lacunar infarcts from the T1-, T2-weighted, and FLAIR images. This segmentation algorithm contains several key components, including hyperintensity region initialization and refinement based on twostage Gaussian mixture models, as well as the classification of brain abnormalities. To our knowledge, this is the first paper on the automated segmentation of the cortical infarct. In general, our automated segmentation accuracy is comparable with the test–retest reliability of the manual segmentation. Nevertheless, our automated segmentation is relatively conservative compared with the manual segmentation in terms of the WMH or CI volumes or the number of LIs. There are two main reasons for the underestimation of the WMH. Firstly, the automated segmentation is performed in the isotropic T1-weighted MRI space, where FLAIR and T2weighted images are aligned. In contrast, the manual segmentation is directly performed in the original 3 mm thick FLAIR MRI to reduce the manual workload. In order to compare the segmentation results with manual ground truth, the result of the automated segmentation
Fig. 3. Bland–Altman plots of automated/manual segmentation volume difference against automated/manual segmentation mean volume. Panels (A–C) respectively show the Bland–Altman plots for mild, moderate, and severe white matter hyperintensity (WMH). Limits of the agreement (dashed lines) are shown as mean ± 1.96 standard deviation (SD).
Y. Wang et al. / NeuroImage 60 (2012) 2379–2388
2385
Fig. 4. Examples of the cortical infarct (CI) segmentation. Each row shows one example of the CI segmentation from one subject. The columns from left to right respectively show FLAIR axial slices without and with the labels from the automated and manual segmentations.
is downgraded to a lower resolution such that errors may occur in the interpolation of the binary mask. Secondly, in areas with large confluent WMHs, the actual intensity may not appear as homogeneous as to what the human eye actually sees, because of the partial volume effect at the boundaries between hyperintense signals with the normal intensities. As for the CI, the discrepancy between the automated and manual segmentation is mainly caused by an unclear boundary between the WMH and CI. Due to the small size of LIs, the interpolation in the coregistration of the T1-, T2-weighted, and FLAIR images within each subject may blur the intensity of LIs and thus cause mislabeling of LIs in our automated segmentation. Advanced and fast imaging techniques will be needed to acquire high resolution T2-weighted and FLAIR images for further improving the LI segmentation accuracy.
Consistent with existing WMH segmentation methods (AdmiraalBehloul et al., 2005; Anbeek et al., 2004; Dyrby et al., 2008), our study shows that the segmentation accuracy improves as the WMH load increases. Our WMH segmentation accuracy is firstly, comparable with the manual test–retest reliability and secondly, with that of existing state-of-the art WMH segmentation approaches (Admiraal-Behloul et al., 2005; Anbeek et al., 2004; Dyrby et al., 2008) and multiple sclerosis segmentation approaches (Shiee et al., 2010; Wu et al., 2006; Zijdenbos et al., 2002). For the LI, previous methods reported the segmentation sensitivities to be 96.8% (Uchiyama et al., 2007) and 90.1% (Yokoyama et al., 2007), and false positive per subject to be 0.76
Table 3 Segmentation accuracy of cortical infarcts (CIs). The second and third columns respectively list the Dice similarity index (DSI) of the test–retest reliability between two manual labels and between the manual and automated segmentation (auto–manual). The fourth and fifth columns respectively list the CI volumes from the manual and automated segmentation. DSI
Volume(ml)
Subject
Test–retest
Auto–manual
Manual
Auto
1 3 5 6 7 10 Mean (SD)
0.89 0.86 0.80 0.76 0.85 0.85 0.84 (0.05)
0.74 0.72 0.80 0.81 0.80 0.86 0.79 (0.05)
2.45 1.01 1.03 12.21 4.07 11.47 5.37 (5.14)
3.10 0.83 0.92 10.52 3.48 10.12 4.83 (4.39)
Fig. 5. Bland–Altman plot of automated/manual segmentation volume difference against automated/manual segmentation mean volume for the cortical infarct (CI). Limits of the agreement (dashed lines) are shown as mean ± 1.96 standard deviation (SD).
2386
Y. Wang et al. / NeuroImage 60 (2012) 2379–2388
Fig. 6. Examples of the lacunar infarct (LI) segmentation. The first three rows respectively show three examples of the LI segmentation. The columns from left to right respectively show FLAIR axial slices without and with the labels from the automated and manual segmentations. The automated segmentations of the LIs in the middle columns of the first three rows are enlarged and respectively shown in the three panels of the last row.
Fig. 7. Histogram of the difference in the lacunar infarcts (LIs) number between the automated and manual segmentations.
Fig. 8. Scatter plot of white matter hyperintensity (WMH) volumes and the Age-Related White Matter Changes (ARWMC) scores.
Y. Wang et al. / NeuroImage 60 (2012) 2379–2388
(Uchiyama et al., 2007) and 1.70 (Yokoyama et al., 2007). Our method is conservative with a sensitivity rate of 80.6%, lower than the two existing methods. Nevertheless, our false positive rate is only 0.06 per subject, which is significantly better. Thus far, to the best of our knowledge, this is the first report on the automated segmentation of the CI. We obtain a CI segmentation accuracy of 79%, higher than that for WMH. In addition to the comparison between the automated and manual labels, we also demonstrate strong correlation between WMH and the visual grading, ARWMC, which has been widely used in clinical studies. This indicates potential clinical use of the quantitative WMH. The study has important strengths and limitations. Our automated segmentation framework integrates Gaussian mixture models, region growing, and morphological operations. Hence, it does not require nor rely on any selection of training data, as discussed in Anbeek et al. (2004), Dyrby et al. (2008), Schwarz et al. (2009), Scully et al. (2010), and Wen et al. (2009). These simple operations make the computation efficient. However, the accuracy of the later stages in our segmentation framework relies on those of the preceding stages. For instance, the LI occurring right next to the lateral ventricles is wrongly classified when the lateral ventricles are wrongly labeled in the first stage of the T1-weighted image segmentation. Thus, segmentation errors could be accumulated through the multi-stage processes. Additionally, our segmentation approach can only automatically identify subacute and early chronic stages of the cortical infarcts that can be visually seen on the FLAIR or T2-weighted MRI. Further investigation is needed for the automated delineation of acute cortical infarcts by combining diffusion weighted images (Xavier et al., 2003). In summary, we present the automated segmentation approach for the WMH, CI, and LI using the T1-, T2-weighted, and FLAIR images. From its evaluation in a large-scale dataset, we demonstrate that this segmentation approach is robust and has potential use in clinical studies. Acknowledgments The work was supported by grants A*STAR SICS-09/1/1/001, a center grant from the National Medical Research Council (NMRC/ CG/NUHS/2010), the Young Investigator Award at the National University of Singapore (NUSYIA FY10 P07), and the National University of Singapore MOE AcRF grants. References Admiraal-Behloul, F., Van Den Heuvel, D.M., Olofsen, H., Van Osch, M.J., Van Der Grond, J., Van Buchem, M.A., Reiber, J.H., 2005. Fully automatic segmentation of white matter hyperintensities in MR images of the elderly. Neuroimage 28, 607–617. Anbeek, P., Vincken, K.L., Van Osch, M.J.P., Bisschops, R.H.C., Van Der Grond, J., 2004. Probabilistic segmentation of white matter lesions in MR imaging. Neuroimage 21, 1037–1044. Awad, I.A., Johnson, P.C., Spetzler, R.F., Hodak, J.A., 1986. Incidental subcortical lesions identified on magnetic resonance imaging in the elderly. II. Postmortem pathological correlations. Stroke 17, 1090–1097. Bennett, D.A., Schneider, J.A., Bienias, J.L., Evans, D.A., Wilson, R.S., 2005. Mild cognitive impairment is related to Alzheimer disease pathology and cerebral infarctions. Neurology 64, 834–841. Carey, C.L., Kramer, J.H., Josephson, S.A., Mungas, D., Reed, B.R., Schuff, N., Weiner, M.W., Chui, H.C., 2008. Subcortical lacunes are associated with executive dysfunction in cognitively normal elderly. Stroke 39, 397–402. Carmichael, O., Schwarz, C., Drucker, D., Fletcher, E., Harvey, D., Beckett, L., Jack Jr., C.R., Weiner, M., DeCarli, C., 2010. Longitudinal changes in white matter disease and cognition in the first year of the Alzheimer disease neuroimaging initiative. Arch. Neurol. 67, 1370–1378. Debette, S., Markus, H.S., 2010. The clinical importance of white matter hyperintensities on brain magnetic resonance imaging: systematic review and meta-analysis. BMJ (Clinical Research Ed.) 341, c3666. Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. B 39, 1–38. Dice, L.R., 1945. Measures of the amount of ecologic association between species. Ecology 26, 297–302. Dyrby, T.B., Rostrup, E., Baaré, W.F.C., van Straaten, E.C.W., Barkhof, F., Vrenken, H., Ropele, S., Schmidt, R., Erkinjuntti, T., Wahlund, L.O., Pantoni, L., Inzitari, D.,
2387
Paulson, O.B., Hansen, L.K., Waldemar, G., 2008. Segmentation of age-related white matter changes in a clinical multi-center study. Neuroimage 41, 335–345. Fischl, B., Salat, D.H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., Van Der Kouwe, A., Killiany, R., Kennedy, D., Klaveness, S., Montillo, A., Makris, N., Rosen, B., Dale, A.M., 2002. Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron 33, 341–355. Garrett, K.D., Cohen, R.A., Paul, R.H., Moser, D.J., Malloy, P.F., Shah, P., Haque, O., 2004. Computer-mediated measurement and subjective ratings of white matter hyperintensities in vascular dementia: relationships to neuropsychological performance. Clin. Neuropsychol. 18, 50–62. Gibson, E., Gao, F., Black, S.E., Lobaugh, N.J., 2010. Automatic segmentation of white matter hyperintensities in the elderly using FLAIR images at 3T. J. Magn. Reson. Imaging 31, 1311–1322. He, J., Iosif, A.M., Lee, D.Y., Martinez, O., Chu, S., Carmichael, O., Mortimer, J.A., Zhao, Q., Ding, D., Guo, Q., Galasko, D., Salmon, D.P., Dai, Q., Wu, Y., Petersen, R.C., Hong, Z., Borenstein, A.R., DeCarli, C., 2010. Brain structure and cerebrovascular risk in cognitively impaired patients: Shanghai Community Brain Health Initiative—pilot phase. Arch. Neurol. 67, 1231–1237. Jack, C.R., O'Brien, P.C., Rettman, D.W., Shiung, M.M., Xu, Y., Muthupillai, R., Manduca, A., Avula, R., Erickson, B.J., 2001. FLAIR histogram segmentation for measurement of leukoaraiosis volume. J. Magn. Reson. Imaging 14, 668–676. Jenkinson, M., Bannister, P., Brady, M., Smith, S., 2002. Improved optimization for the robust and accurate linear registration and motion correction of brain images. Neuroimage 17, 825–841. Jokinen, H., Gouw, A.A., Madureira, S., Ylikoski, R., Van Straaten, E.C.W., Van Der Flier, W.M., Barkhof, F., Scheltens, P., Fazekas, F., Schmidt, R., Verdelho, A., Ferro, J.M., Pantoni, L., Inzitari, D., Erkinjuntti, T., 2011. Incident lacunes influence cognitive decline: the LADIS study. Neurology 76, 1872–1878. Jokinen, H., Kalska, H., Ylikoski, R., Madureira, S., Verdelho, A., Gouw, A., Scheltens, P., Barkhof, F., Visser, M.C., Fazekas, F., Schmidt, R., O'Brien, J., Hennerici, M., Baezner, H., Waldemar, G., Wallin, A., Chabriat, H., Pantoni, L., Inzitari, D., Erkinjuntti, T., 2009. MRI-defined subcortical ischemic vascular disease: baseline clinical and neuropsychological findings. Cerebrovasc. Dis. 27, 336–344. Joshi, M., Cui, J., Doolittle, K., Joshi, S., Van Essen, D., Wang, L., Miller, M.I., 1999. Brain segmentation and the generation of cortical surfaces. Neuroimage 9, 461–476. Kapeller, P., Barber, R., Vermeulen, R.J., Adèr, H., Scheltens, P., Freidl, W., Almkvist, O., Moretti, M., Del Ser, T., Vaghfeldt, P., Enzinger, C., Barkhof, F., Inzitari, D., Erkinjunti, T., Schmidt, R., Fazekas, F., 2003. Visual rating of age-related white matter changes on magnetic resonance imaging: scale comparison, interrater agreement, and correlations with quantitative measurements. Stroke 34, 441–445. Marquine, M.J., Attix, D.K., Goldstein, L.B., Samsa, G.P., Payne, M.E., Chelune, G.J., Steffens, D.C., 2010. Differential patterns of cognitive decline in anterior and posterior white matter hyperintensity progression. Stroke 41, 1946–1950. Ota, M., Nemoto, K., Sato, N., Yamashita, F., Asada, T., 2009. Relationship between white matter changes and cognition in healthy elders. Int. J. Geriatr. Psychiatry 24, 1463–1469. Pantoni, L., 2002. Pathophysiology of age-related cerebral white matter changes. Cerebrovasc. Dis. 13, 7–10. Prins, N.D., Van Straaten, E.C.W., Van Dijk, E.J., Simoni, M., Van Schijndel, R.A., Vrooman, H.A., Koudstaal, P.J., Scheltens, P., Breteler, M.M.B., Barkhof, F., 2004. Measuring progression of cerebral white matter lesions on MRI: visual rating and volumetrics. Neurology 62, 1533–1539. Ramirez, J., Gibson, E., Quddus, A., Lobaugh, N.J., Feinstein, A., Levine, B., Scott, C.J.M., Levy-Cooperman, N., Gao, F.Q., Black, S.E., 2011. Lesion Explorer: a comprehensive segmentation and parcellation package to obtain regional volumetrics for subcortical hyperintensities and intracranial tissue. Neuroimage 54, 963–973. Ropper, A.H.a.B., R.H., 2005. Adams and Victor's principles of neurology, 8th ed. McGraw-Hill Professional. Sasaki, M., Hirai, T., Taoka, T., Higano, S., Wakabayashi, C., Matsusue, E., Ida, M., 2008. Discriminating between silent cerebral infarction and deep white matter hyperintensity using combinations of three types of magnetic resonance images: a multicenter observer performance study. Neuroradiology 50, 753–758. Schneider, J.A., Wilson, R.S., Bienias, J.L., Evans, D.A., Bennett, D.A., 2004. Cerebral infarctions and the likelihood of dementia from Alzheimer disease pathology. Neurology 62, 1148–1155. Schwarz, C., Fletcher, E., DeCarli, C., Carmichael, O., 2009. Fully-automated white matter hyperintensity detection with anatomical prior knowledge and without FLAIR. Proceedings of the Information Processing in Medical Imaging Conference, 21, 239–251. Scully, M., Anderson, B., Lane, T., Gasparovic, C., Magnotta, V., Sibbitt, W., Roldan, C., Kikinis, R., Bockholt, H.J., 2010. An automated method for segmenting white matter lesions through multi-level morphometric feature classification with application to lupus. Front. Hum. Neurosci. 4, 27. Shiee, N., Bazin, P.L., Ozturk, A., Reich, D.S., Calabresi, P.A., Pham, D.L., 2010. A topologypreserving approach to the segmentation of brain images with multiple sclerosis lesions. Neuroimage 49, 1524–1535. Sied, J.G., Zijdenbos, A.P., Evans, A.C., 1998. A nonparametric method for automatic correction of intensity nonuniformity in MRI data. IEEE Trans. Med. Imaging 17, 87–97. Smith, S.M., 2002. Fast robust automated brain extraction. Hum. Brain Mapp. 17, 143–155. Stokking, R., Vincken, K.L., Viergever, M.A., 2000. Automatic morphology-based brain segmentation (MBRASE) from MRI-T1 data. Neuroimage 12, 726–738. Tripathi, R., Wang, K., Mysore, P., Spencer, D.C., 2011. The influence of lacunes on cognitive function. Neurology 76, e111.
2388
Y. Wang et al. / NeuroImage 60 (2012) 2379–2388
Uchiyama, Y., Yokoyama, R., Ando, H., Asano, T., Kato, H., Yamakawa, H., Hara, T., Iwama, T., Hoshi, H., Fujita, H., 2007. Improvement of automated detection method of lacunar infarcts in brain MR images. Conf Proc IEEE Eng Med Biol Soc, 1599–1602. Van Straaten, E.C.W., Fazekas, F., Rostrup, E., Scheltens, P., Schmidt, R., Pantoni, L., Inzitari, D., Waldemar, G., Erkinjuntti, T., Mäntylä, R., Wahlund, L.O., Barkhof, F., 2006. Impact of white matter hyperintensities scoring method on correlations with clinical data: the LADIS study. Stroke 37, 836–840. Vannorsdall, T.D., Waldstein, S.R., Kraut, M., Pearlson, G.D., Schretlen, D.J., 2009. White matter abnormalities and cognition in a community sample. Arch. Clin. Neuropsychol. 24, 209–217. Wahlund, L.O., Barkhof, F., Fazekas, F., Bronge, L., Augustin, M., Sjögren, M., Wallin, A., Ader, H., Leys, D., Pantoni, L., Pasquier, F., Erkinjuntti, T., Scheltens, P., 2001. A new rating scale for age-related white matter changes applicable to MRI and CT. Stroke 32, 1318–1322. Warfield, S.K., Kaus, M., Jolesz, F.A., Kikinis, R., 2000. Adaptive, template moderated, spatially varying statistical classification. Med. Image Anal. 4, 43–55. Wen, W., Sachdev, P., 2004. The topography of white matter hyperintensities on brain MRI in healthy 60- to 64-year-old individuals. Neuroimage 22, 144–154. Wen, W., Sachdev, P.S., Li, J.J., Chen, X., Anstey, K.J., 2009. White matter hyperintensities in the forties: their prevalence and topography in an epidemiological sample aged 44–48. Hum. Brain Mapp. 30, 1155–1167.
Wu, Y., Warfield, S.K., Tan, I.L., Wells Iii, W.M., Meier, D.S., van Schijndel, R.A., Barkhof, F., Guttmann, C.R.G., 2006. Automated segmentation of multiple sclerosis lesion subtypes with multichannel MRI. Neuroimage 32, 1205–1215. Xavier, A.R., Qureshi, A.I., Kirmani, J.F., Yahia, A.M., Bakshi, R., 2003. Neuroimaging of stroke: a review. South. Med. J. 96, 367–379. Yokoyama, R., Zhang, X., Uchiyama, Y., Fujita, H., Hara, T., Zhou, X., Kanematsu, M., Asano, T., Kondo, H., Goshima, S., Hoshi, H., Iwama, T., 2007. Development of an automated method for the detection of chronic lacunar infarct regions in brain MR images. IEICE Transactions on Information and Systems E90-D, 943–954. Young, V.G., Halliday, G.M., Kril, J.J., 2008. Neuropathologic correlates of white matter hyperintensities. Neurology 71, 804–811. Yushkevich, P.A., Piven, J., Hazlett, H.C., Smith, R.G., Ho, S., Gee, J.C., Gerig, G., 2006. User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. Neuroimage 31, 1116–1128. Zijdenbos, A.P., Dawant, B.M., Margolin, R.A., Palmer, A.C., 1994. Morphometric analysis of white matter lesions in MR images: method and validation. IEEE Trans. Med. Imaging 13, 716–724. Zijdenbos, A.P., Forghani, R., Evans, A.C., 2002. Automatic “pipeline” analysis of 3-D MRI data for clinical trials: application to multiple sclerosis. IEEE Trans. Med. Imaging 21, 1280–1291.