NeuroImage 56 (2011) 2038–2046
Contents lists available at ScienceDirect
NeuroImage j o u r n a l h o m e p a g e : w w w. e l s e v i e r. c o m / l o c a t e / y n i m g
Technical Note
Manual, semi-automated, and automated delineation of chronic brain lesions: A comparison of methods Marko Wilke a,b,⁎, Bianca de Haan c, Hendrik Juenger a,d, Hans-Otto Karnath c,e a
Department of Pediatric Neurology and Developmental Medicine, Children's Hospital, University of Tübingen, Germany Experimental Pediatric Neuroimaging, Children's Hospital & Dept. of Neuroradiology, University of Tübingen, Germany Center of Neurology, Division of Neuropsychology, Hertie-Institute for Clinical Brain Research, University of Tübingen, Germany d Department of Pediatrics, Klinikum rechts der Isar, Technische Universität Munich, Germany e Center for Advanced Brain Imaging, Georgia Institute of Technology, Atlanta, GA, USA b c
a r t i c l e
i n f o
Article history: Received 11 November 2010 Revised 1 April 2011 Accepted 7 April 2011 Available online 14 April 2011
a b s t r a c t The exact delineation of chronic brain lesions is a crucial step when investigating the relationship between brain structure and (dys-)function. For this, manual tracing, although very time-consuming, is still the gold standard. In order to assess the possible contributions from other methods, we compared manual tracing of lesion boundaries with a newly developed semi-automated and a fully automated approach for lesion definition in a sample of chronic stroke patients (n = 11, 5 m, median age 12, range 10–30 years). Manual tracing requires substantially more human input (4.8–9.6 h/subject) than semi-automated (24.9 min/ subject) and automated processing (1 min/subject). When compared with manual tracing as the gold standard, both the semi-automated (tested with 4 different smoothing filters) and the automated approach towards lesion definition performed on an acceptable level, with an average Dice's similarity index of .53–.60 (semi-automated) and .49 (automated processing). In all semi-automated and automated approaches, larger lesions were identified with a significantly higher performance than smaller lesions, as were central versus peripheral voxels, indicating that the surface-to-volume ratio explains this trend. The automated approach failed to identify two lesions. In several cases, indirect lesion effects (such as enlarged ventricles) were detected using the semi-automated or the automated approach. We conclude that manual tracing remains the gold standard for exact lesion delineation, but that semiautomated and automated approaches may be alternatives for larger lesions and/or larger studies. The detection of indirect lesion effects may be another application of such approaches in the future. © 2011 Elsevier Inc. All rights reserved.
Introduction A prominent method used in cognitive neuroscience to advance our understanding of brain function is the investigation of individuals suffering from brain damage. The correlation between behavior and anatomy allows insights into how function depends upon structure (Fellows et al., 2005). The precise delineation of lesions in brain imaging data becomes even more important when it comes to investigating this relationship by using recent statistical approaches, such as voxelwise lesion behavior mapping (VLBM; Bates et al., 2003; Rorden et al., 2007, 2009). These techniques complement brain activation methods (such as functional MRI, PET, etc.) as they allow to identify those areas of the brain that are necessary for a given function by identifying regions that are injured in patients with impairment
⁎ Corresponding author at: Department of Pediatric Neurology and Developmental Medicine, Children's Hospital, University of Tübingen, Hoppe-Seyler-Str. 1, 72076 Tübingen, Germany. Fax: + 49 7071 29 5473. E-mail address:
[email protected] (M. Wilke). 1053-8119/$ – see front matter © 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.neuroimage.2011.04.014
yet spared in those with less or no impairment (Fellows et al., 2005; Rorden and Karnath, 2004). The most commonly used approach to date is manual tracing. Here, experienced raters commonly delineate the lesions manually on MR images or the computed tomography (CT) scans of an individual patient (Borovsky et al., 2007; Karnath and Perenin, 2005; Moro et al., 2008; Mort et al., 2003). While this procedure yields exact regions of interest, such manual delineation in current high-resolution images is also very labor-intensive and thus time-consuming (Ashton et al., 2003; Seghier et al., 2008), and efforts to speed up a manual rating procedure may impair performance, reducing its benefit (Elsheikh et al., 2010). An alternative is the automated detection of “abnormalities” in combination with manual editing (Achiron et al., 2002; Ashton et al., 2003). More recently, fully automated approaches were suggested that identify “unusual” voxels in the process of tissue segmentation (Seghier et al., 2008). In this context, it should be remembered that a chronic brain lesion, broadly defined, may have different aspects to it. First and most obvious, the lesion occupies space originally taken up by healthy brain tissue; therefore, the direct lesion effect is the destruction and/or
M. Wilke et al. / NeuroImage 56 (2011) 2038–2046
displacement of healthy tissue. However, there are also (more or less obvious) indirect effects of chronic lesions, which again may be adjacent to the lesion or remote from it. An example for the former is a consecutive enlargement of the lateral ventricles in the case of white matter damage (Dyet et al., 2006; Wilke et al., 2009). Remote effects must be expected to be more subtle. They may be brought about by a gain of function (as in the reorganization of function; Desmurget et al., 2007; Staudt, 2010) or a loss of function (as in the case of neuronal denervation; Banati, 2002; Henselmans et al., 2000). Importantly, even comparatively minor functional changes can now be detected as being accompanied by structural changes in the brain (Draganski et al., 2004; Scholz et al., 2009). Manual tracing approaches, aimed at describing the original lesion, are not well-suited to detect these indirect effects. While difficult to detect and to ascribe to the original lesion, the detection of such effects would allow investigating the mechanisms underlying neural plasticity and reorganization in much more detail as other parts of a new network could be identified. In the light of these different approaches and the different effects a lesion can have on the brain, it is not clear which approach is best-suited to detect which aspect of a brain lesion. Therefore, the present study is aimed at comparing three different approaches (manual, semi-automated, and automated) in terms of performance and time requirements. Methods and subjects Subjects and imaging data MR imaging data from eleven subjects (5 m, median age 12 years, range 10–30 years) with chronic, perinatally acquired unilateral MCA stroke (4R and 7L) and consecutive hemiparesis was used (Table 1). Subjects were recruited as part of an ongoing study (Juenger et al., 2007; Kuhnke et al., 2008; Walther et al., 2009; Wilke et al., 2009) on reorganization following early brain lesions. This data serves as a model to establish and compare the performance of the different methods. All procedures were in accordance with local institutional review board requirements; all adults and all parents of underage subjects gave written informed consent, and all children gave oral assent prior to scanning. All subjects were imaged on a 1.5 T clinical MR-scanner (Avanto, Siemens Medizintechnik, Erlangen, Germany), using a standard quadrature head coil. A T1-weighted 3D-dataset (FLASH, TR =11 ms, TE= 4.94 ms, flip angle=15°, final resolution 1× 1× 1 mm3) was acquired, covering the whole head. As the first step in data processing, all images with a lesion on the left side of the brain were flipped by modifying the header information, so that all lesions were on one side. Approach 1: manual delineation Manual delineation of each brain lesion was performed by a single experienced rater (BdH). The boundary of the lesion was manually delineated directly on the individual native-space MRI image for every Table 1 Demographic details and lesion size for all subjects. All subjects had suffered from perinatally acquired unilateral MCA, and imaging was done following inclusion into the study. Patient
Gender [M/F]
Age [years]
Lesion size [voxel]
Lesion [L/R]
1 2 3 4 5 6 7 8 9 10 11
M M M M W W W W W W M
20 12 30 12 19 11 16 10 16 12 12
7.343 18.147 53.840 63.774 123.241 11.812 83.570 100.528 20.150 8.616 3.413
R L R R L L L L R L L
2039
single transversal slice using MRIcron software (Rorden et al., 2007; www.mricro.com). Tracing of a single slice is completed in 5-10 min, depending on lesion complexity. The circled lesion is then automatically filled, and the resulting three-dimensional volume of interest (VOI), describing the direct lesion effect, is saved as an image volume in register with the original MR-image. Approach 2: semi-automated delineation This approach is based on the starting estimate that brain lesions are detectable as an “abnormality” in tissue homogeneity, tissue composition, shape, or laterality maps. Hence, our approach was to generate “abnormality maps” for each of those features. We used SPM5 (Wellcome Department, UCL, UK) running in Matlab (The Mathworks, Nattick, USA) as well as custom functions and functionality available within the VBM5-toolbox (Gaser, 2010). The final interactive combination of feature maps (see below) was done by the same experienced rater who performed the manual delineation in order to ensure consistent appreciation of each lesion. Spatial normalization was achieved within the framework of unified segmentation (Ashburner and Friston, 2005), with parameters optimized for lesioned brains (Crinion et al., 2007). This involves increasing the regularization term of the non-linear warping as well increasing the number of Gaussians used to determine the non-brain class. In order to improve the starting estimate for the ensuing normalization step, each image was rigidly aligned to a T1-template, using the coregistration feature available within SPM (Ashburner and Friston, 1997). As larger lesions sometimes pose difficulties for the initial affine registration, an initial segmentation step was completed that writes out all three tissue classes in native space. For this first step, tissue priors are used as they “stabilize” segmentation. Thereafter, the native-space tissue classes (GM, WM, and CSF) are combined, resulting in a binary brain mask which is then normalized to a standard brain mask in normalized space. This effective lesion-masking (Brett et al., 2001) results in robust affine normalization parameters which are then used to initialize the second segmentation step. This second segmentation step, which generates the final tissue classes (normalized GM, WM, and CSF) from the original T1dataset, is done without using information from tissue priors in order to avoid any confound resulting from the “expected” tissue distribution (Gaser et al., 2007; Seghier et al., 2008). This also takes advantage of the fact that chronic lesions will overwhelmingly be classified as CSF, based in their voxel intensity values (Seghier et al., 2008; Wilke et al., 2009). Thereafter, the following features were assessed in these “optimally normalized, optimally segmented” tissue maps: tissue homogeneity, tissue composition, shape, and laterality (see also Fig. 1). Tissue homogeneity was assessed using a hidden Markov random field (Cuadra et al., 2005) in which the segmentation result in each voxel is related to its neighbor: if a voxel is classified as gray matter and is surrounded by 26 neighboring voxels that are also classified as gray matter, then tissue homogeneity will be maximal. Therefore, brain regions in which one tissue class predominates will be homogenous, while more variable brain regions containing different classes (such as the border between white and gray matter; see Fig. 1) will be less homogenous. This feature is used based on the assumption that a lesion will be unusually homogeneous. It is calculated from the individual subject's segmentation results as well as from the averaged segmentation results from a control population, as available in the standard (or, potentially, custom) tissue priors. The tissue homogeneity difference is scaled to a maximum of 255 and is calculated for each voxel by summing the absolute differences, as in ΔTH = GMc −GMp + WMc −WMp + CSFc −CSFp
ð1Þ
such that the tissue homogeneity difference ΔTH is calculated as the sum of the absolute difference in the homogeneity of gray matter from
2040
M. Wilke et al. / NeuroImage 56 (2011) 2038–2046
Fig. 1. Overview of the semi-automated processing approach: following robust spatial normalization, using cost–function masking (1), the image is segmented without priors (2) and robust z-score maps of four features are generated: tissue homogeneity (3), tissue composition (4), Jacobian determinant (5), and laterality (6). These are then manually combined (7) to yield a final lesion mask. See the Methods and subjects section for more details.
controls (GMc) and the individual patient (GMp), plus the respective differences in white matter (WM) and cerebrospinal fluid (CSF). For an exemplary voxel, if the control group has a local tissue homogeneity of 185 in the GM class, of 100 in the WM class, and of 100 in the CSF class, then a patient with a lesion and values of 100/ 100/250, respectively, would yield a homogeneity difference of ΔTH = 235 (85 + 0 + 150). Tissue composition is also assessed from the segmentation results; it aims at detecting shifts from one tissue class to another, again in comparison with results from a control population. ΔTC is calculated in the same way as ΔTH in Eq. (1), above. For an exemplary voxel, if the control group has a probability of 85% for GM, of 10% for WM, and 5% for CSF (in spm's unified segmentation framework (Ashburner and Friston, 2005), tissue classes sum up to a probability of 100% within the brain), then a patient with a lesion and values of 20/20/60%, respectively, would yield a tissue class difference of ΔTC= 120 (65 + 10+ 55). Shape differences are detected by analyzing the Jacobian determinant (JD) resulting from (symmetrical) spatial normalization (Ashburner and Friston, 1999). Increasing the regularization for non-linear warping has been shown to stabilize unified segmentation in the presence of brain lesions (Crinion et al., 2007). In order to find regions that need to be deformed more, a separate normalization step was done here that uses less regularization, allowing for more non-linear deformation. From this step, only the JD was calculated from the spatial normalization matrix which is a measure of the overall tissue volume gain or loss resulting from the deformations applied during spatial normalization; it constitutes the simplest case of deformation-based morphometry and is thus complementary to classical voxel-based morphometry approaches (Gaser et al., 2001; Fig. 1, step 4). While the absolute value is dominated by the initial affine transformation, it will also reflect the effects of nonlinear spatial normalization (Ashburner and Friston, 1999, 2005). In the case of a lesion, such volume change must be expected to be prominent as the algorithm will try to match the abnormal tissue to the (normal) template by attempting to either reduce abnormal or increase normal tissue volume (Gaser et al., 2001). As no native-space reference data is (usually) available, the Jacobian image is not compared with control data but is used based on the assumption that abnormal tissue will have to be changed most in order to match the template image, and that therefore the highest value (JDMax) will coincide with the lesion.
Laterality deviations, i.e., unusual asymmetry, were detected by deriving a lateralization index from each voxel in the segmented tissue map (Luders et al., 2004; Wilke and Lidzba, 2007; Fig. 1, step 5). This was done based on the assumption that any strikingly lateralized effect in a macroscopically broadly symmetrical brain is likely a deviation from normal (Volkau et al., 2006). In order for these voxelwise left–right comparisons to be applicable, all spatial transformations were done using symmetrical tissue maps. The tissue maps from the control populations were processed accordingly, and the voxelwise differences for ΔLI were again calculated over all tissue classes according to Eq. (1). To increase signal to noise, smoothing with a Gaussian filter was applied to all parameter maps (tissue composition, homogeneity, and laterality) before comparing the individual with the reference values; smoothing is not necessary for the Jacobian image as this is inherently smooth (Gaser et al., 2001; see also Fig. 1). As the smoothing width determines the spatial scale on which differences are most likely detected (due to the matched filter theorem; Friston et al., 2006; Jones et al., 2005), it was iteratively changed to be 2, 4, 6, or 8 mm FWHM. Calculating these features results in 4 image volumes for every subject. In order to combine the results from the different features, a z-score transformation was performed. The classical z-score is calculated according to z = (x − M) / SD, where for each value x, z is calculated by subtracting the mean M (overall voxels) and dividing by the standard deviation SD (overall voxels). However, both the mean and the standard deviation are vulnerable to outliers in a population not conforming to the normality assumption (Micceri, 1989). In the presence of severely abnormal brain tissue, this assumption cannot be expected to hold true, we therefore standardized each feature map of each subject individually by computing a robust z-score (zR; Frederix and Pauwels, 1999) which uses the median (Md) and the mean absolute deviation (MAD) instead and is thus calculated as zR = ðx−MdÞ = MAD
ð2Þ
In order to assess the true underlying distribution of the data, we implemented a bootstrapping approach (Davison et al., 2003), as previously used to generate robust means (Wilke and Schmithorst, 2006). This approach generates several (default: 100) subsamples of 10% of the original size, from each of which median and MAD-values are
M. Wilke et al. / NeuroImage 56 (2011) 2038–2046
generated; the respective medians of these 100 values are then entered into Eq. (2). This ensures that even a severely skewed distribution (in the presence of a large lesion) is appropriately sampled. Following the independent conversion of each feature into a robust z-score image, these results can then be combined during a final interactive step in which the “weight” (z-threshold) of each feature can be set independently in order to achieve the best-possible delineation of the lesion (Fig. 1, step 7). Additionally, an extent threshold can be set that excludes smaller clusters. Upon each modification, the updated result is shown, overlaid on the respective T1-image. This allows the rater to judge the overlay of the thus-derived lesion mask with the actual lesion. If the results are suboptimal, each parameter (z-score threshold for each feature and the cluster extent) can be independently modified to achieve a better overlap. This is the main step requiring user intervention in the semi-automated approach, effectively converting the individual lesion z-score maps to a joint, binary lesion map. Upon judging the results final, the thus-defined lesion is written out as an image volume. Approach 3: fully automated delineation This approach is based on the starting estimate that brain lesions can be identified as outlier voxels during segmentation, using a fuzzy clustering approach with fixed-prototypes (“Automatic Lesion Identification: ALI”; Seghier et al., 2008). It aims at identifying atypical voxels that do not belong to one of the expected tissue types and then assigns it to a new class which effectively represents the lesioned tissue as detected by the algorithm. The algorithm's presets were left at their defaults. Comparison of approaches Each approach resulted in an image volume, representing the lesion mask for each subject. As the manual delineation must still be considered the gold standard, results from the other two approaches were evaluated in reference to the manual results. We decided to compare the results in normalized space as the most-likely application of any (semi-)automated approach must be the analysis of larger cohorts, for which spatial normalization will usually be necessary. In order to achieve this, the spatial normalization parameters generated for each subject were applied to the native space manual lesion mask, defining the mask in normalized space. The manual mask and the semiand fully automatically generated masks were then compared on a voxel-by-voxel basis, using the Dice similarity index (DSI; Dice, 1945) as used previously (Seghier et al., 2008). This index ranges from 0 to 1 and can be considered an indicator of the performance of the approach as it takes into account both false positives and false negatives; it is calculated according to DSIA;B =
2⋅ð A∩BÞ ð A + BÞ
ð3Þ
where the DSI between two datasets A and B is calculated from twice the overlap between both maps in relation to their sum. When applied to comparing partially overlapping image volumes (where A is the reference [i.e., the manually defined lesion] and B is the approximation [i.e. the semi- or fully automated lesion definition]), the numerator becomes twice the number of true positives divided by the sum of voxels in the reference and the number of false positives. The Dice index can be compared with the kappa statistic (Zou et al., 2004), where values of .6–.8 are good and values exceeding .8 are considered excellent; in previous similar comparisons, values exceeding .7 were considered “high” (Seghier et al., 2008). In order to assess the regional variability of results, the final volumes of interest were divided into central and peripheral voxels by smoothing the lesion masks with a Gaussian Filter of FWHM= 8 mm; as a result of this, peripheral voxels (close to many voxels with a value of 0) will have a lower intensity, while voxels near
2041
the center of the lesion will retain a high voxel intensity. By thresholding this smoothed map at .5, this allows to automatically divide the lesion mask into central and peripheral voxels. These maps were also compared with the respective central and peripheral voxels from the manual maps, again using the Dice coefficient. Results were compared using the non-parametrical Mann–Whitney U test, and correlations were assessed using the non-parametrical Kendall's rank correlation. Significance was assumed at p ≤ .05. Results Processing time Manual delineation: the lesion was present on a median of 58 (range, 27–91) transversal slices. This corresponds to an average time for manual tracing of 4.8–9.6 h/subject (range, 2.25–15.2 h), assuming 5 or 10 min/slice, respectively. Semi-automated delineation: loading the images into the algorithm required about 1 min/subject. Unattended processing time was roughly 30 min/subject. Interactive lesion definition required an average of 24.9 (range, 9–50) min/subject. Fully automated delineation: loading the images into the algorithm required about 1 min/subject. Unattended processing time was roughly 30 min/subject. No further interactive step was required. Performance Manual delineation: an overview of the manually delimited lesion maps, overlaid on the respective T1 dataset, is shown in Fig. 2. Semi- and fully automated delineation: when compared with the gold standard, the median Dice similarity index for the semi-automated delineation was .53 (range, .33–.78) for 2 mm smoothing, .55 (range, .29–.81) for 4 mm smoothing, .59 (range, .29–.84) for 6 mm smoothing, and .6 (range, .34–.82) for 8 mm smoothing. For the fully automated delineation, the median DSI was .49 (range, 0–.87; see Fig. 3). None of the differences between the approaches was significant (all p N .05, Mann–Whitney U test). For all approaches, there was a significant correlation (all p b .02, Kendall's rank correlation) between performance and lesion size, with higher performance in bigger lesions. When dividing the lesions into 6 smaller and 5 larger lesions (patients 1, 2, 6, 9, 10, and 11 vs. patients 3, 4, 5, 7, and 8), larger lesions were identified with a significantly higher performance in all semi- and fully automated approaches (all p b .05, Mann–Whitney U test; Fig. 4). For both the semi-automated and the fully automated approach, central voxels (constituting an average 49.61 ± 19.93% of the lesion) are identified with a much higher performance than peripheral voxels. Exemplary results for the semi-automated approach (FWHM= 6 mm) and the fully automated approach are shown in Fig. 5. The fully automated delineation failed to identify 2 lesions. In one case (subject 9), it detected an adjacent indirect effect (an enlarged lateral ventricle; Fig. 6) while no lesion was detected in subject 11 (Fig. 7). Discussion This manuscript compares manual versus semi- and fully automated approaches to delineating chronic brain lesions on highresolution MR images. Manual delineation Manual tracing of lesion boundaries on the individual high resolution MR scans is the traditional approach to lesion analyses and must still be considered the gold standard. In chronic lesions, lesion boundaries (i.e., direct lesion effects) are typically very clear and an expert rater can easily distinguish a lesion from unimpaired tissue as
2042
M. Wilke et al. / NeuroImage 56 (2011) 2038–2046
Fig. 2. Representative axial slices of the brain lesions in all 11 patients, including an overlay of the manual lesion mask, in native space.
well as non-brain (Rorden and Karnath, 2004). Of course, this does not mean that the manual approach is without drawbacks: while intraclass correlation coefficients of .86–.95 were reported, this still amounted to absolute volume differences of 12–18% (Fiez et al., 2000), similar to what has been described as the interrater disagreement when manually delineating white matter lesions (Filippi et al., 1995). This demonstrates that a lesion may present with inherent ambiguities that make definite tracing difficult for even a skilled rater. Therefore, it should be noted that while the comparison of a new approach with the current gold standard is good scientific practice (Knottnerus et al., 2002), ambiguities about how to label a complex lesion will remain even in the presence of a standardized procedure. Hence, perfect agreement between the semior the fully automated approaches with manual tracing is not expected; indeed, this may not even be desirable. However, a further, systematic assessment of whether the disagreements argue in favor or against using the one or the other approach would require establishing a new gold standard, which (if achievable) was felt to be beyond the scope of the current study. Moreover, manual delineation is also time consuming: for the 11 subjects investigated for the present study (see Fig. 2), an experienced rater spent between 4.8 and 9.6 h / subject. For the whole
Fig. 3. Box–Whisker plot of the performance of the semi-automated (with 4 smoothing widths, left) and the automated approach (right), when compared with the manual tracing.
group, this amounts to between 2 and 3 weeks of concentrated tracing, which, compared to the cheap processing time of current computer workstations, is a substantial investment. Semi-automated approach In comparison to the manual delineation of lesion boundaries, the performance of the semi-automated approach is satisfactory (Fig. 3), although the strong influence of the size of the lesion was remarkable: larger lesions were delineated significantly better than smaller ones (Fig. 4). The lack of a strong influence of the smoothing width (2/4/6/ 8 mm) is somewhat surprising in this context, as we expected a systematic effect of the width of the smoothing filter (Friston et al., 2006; Jones et al., 2005) in that larger lesions should profit from more smoothing. However, this was not the case, and while the overall performance of the approaches using more smoothing was slightly better, the difference was marginal. We believe this is explained by the final manual fine tuning where these effects (such as smaller clusters with less smoothing) were effectively masked by manual optimization. Similarly, comparing performance for central versus peripheral voxels shows a much higher performance for central voxels (see Fig. 5), underlining that the main problem for robustly identifying a lesion is at the lesion margins. By definition, smaller lesions will have a higher surface-to-volume ratio, which can easily be demonstrated in our data: in the 6 smaller lesions, central voxels constitute 35.4% of all voxels, while, in the 5 larger lesions, they make up 66.7% of the lesion volume. Therefore, this effect very likely explains at least a part of the observed effect of a better delineation of larger lesions as the more-easily delineated core is relatively bigger. This is consistent with the impression of the rater combining the maps that, apart from indirect effects, the definition of exact borders was the main problem. An example of the algorithm's output is shown in Fig. 8, where the manual and the final lesion maps are compared with the contributing feature maps (tissue homogeneity and composition, laterality, and shape). With regard to the different features used in the semiautomated approach, tissue homogeneity is strongly abnormal in larger lesions and thus nicely suited to delineate the bulk of a lesion (see Fig. 1 for an example). It is of lesser use in brain regions showing high inhomogeneity in both the healthy and the diseased state, such as a lesion on the interface between gray and white matter, which may be highly variable due to individual sulcation patterns (Rademacher et al., 1993). In this regard, however, it is complemented by the tissue composition approach as this is sensitive to shifts of tissue classes especially at the interface between them. As standard segmentation
M. Wilke et al. / NeuroImage 56 (2011) 2038–2046
2043
Fig. 4. Influence of lesion size: performance of the semi-automated and the automated approach for the 6 smaller vs. the 5 larger lesions. All differences significant at p b .05.
approaches will try to segment brain tissue into a number of predefined tissue classes (classically GM, WM, and CSF; Ashburner and Friston, 2000, 2005), a lesion will be classified as one of these tissues. Therefore, what is “lost” in one class is “gained” in the other class, and this shift will show up as a strong difference in tissue composition. In contrast to approaches aimed at formally assessing the significance of a voxel being an outlier (Seghier et al., 2008; Wilke et al., 2003), it is not necessary to have individual imaging data from a control population as a population average (in the form of tissue class priors) is sufficient. This feature also capitalizes on the fact that chronic lesions will most likely be segmented as “being CSF” (Seghier et al., 2008; Wilke et al., 2009). In our opinion, investigating the shift in the tissue classes is also promising as combined analyses have been shown to increase the sensitivity of voxelbased analyses aimed at lesion detection (Bruggemann et al., 2007). However, it should be noted that the segmentation of such severely abnormal brains will only be successful if the prior assumptions about tissue localization and distribution (in the forms of tissue priors describing the expected spatial distribution in a control population: Ashburner and Friston, 2005) are carefully considered. We decided to use a “priorless” segmentation approach as available in a toolbox for SPM (Gaser et al., 2007; Gaser, 2010), resulting in a segmentation based entirely on voxel intensity, without assuming a certain spatial tissue distribution. The Jacobian determinant from the low-dimensional spatial normalization procedure implemented in unified segmentation (Ashburner and Friston, 2005) must currently be considered a somewhat crude measure as its effective spatial resolution is low. Consequently, it will
mainly corroborate abnormalities found in the other features. However, there are two scenarios that would make this a potentially much more powerful approach; one, if high-dimensional warping is performed, as for example available in the DARTEL approach (Ashburner, 2007), and two, if native-space data from a reference population is available, as then, the deformation fields from individual spatial normalization (which are known to encode a substantial part of the “abnormality”; Gaser et al., 2001; Wilke et al., 2003) could be generated and compared between the individual subject and the control population as done for the other features. For these reasons, the “shape” feature was included in the final algorithm despite its current shortcomings. A deviation from the usual pattern of laterality may be used as another indicator of abnormality. While several disorders may present with symmetrical lesions (Barkhof and Scheltens, 2002; Lim, 2009), a strong difference in lateralization between a single subject and data from a control group is highly suspicious as a matter of cause. One obvious drawback is the lack of directionality: a brain region is detected as being different in lateralization with regard to its homotopic region in the opposite hemisphere. This deviation is therefore by definition bilateral (see also Fig. 1, step 6), and one or more of the other features must be used in order to identify the pathological side. As with the shape feature, it therefore seems most useful to corroborate findings from the more sensitive features. This impression is also reflected in the thresholds used for the delineation in the semi-automated approach, where, on average, the lowest threshold was used for the shape feature (z = 1.91 ± .79), followed by the lateralization index (z = 3.00 ± 1.05), the tissue homogeneity feature (z = 3.18 ± .72) and the tissue
Fig. 5. Performance of the semi-automated (illustrated for one smoothing width, FWHM = 6 mm) and the automated approach for central vs. peripheral voxels: note much better performance for voxels in the center of the lesion. Both differences are significant at p b .05.
2044
M. Wilke et al. / NeuroImage 56 (2011) 2038–2046
Fig. 6. Individual performance of all approaches in patient 9. Note difficult-to-describe inhomogeneous lesion and identification of indirect adjacent lesion effect (enlarged ventricle) with larger smoothing (arrow) in the semi-automated approach, and only identification of the indirect adjacent effect in the fully automated approach.
Fig. 7. Individual performance of all approaches in patient 11: Smallest lesion. Note inseparable identification of indirect adjacent lesion effect (enlarged ventricle) in the semiautomated approach. No lesion was detected in the fully automated approach.
composition feature (z = 4.18 ± 1.47; all values from the approach using 6 mm smoothing). This suggests that these features become increasingly specific, and thus useful, for defining the lesion in general and for spatially restricting it.
Automated approach The overall performance of the automated lesion definition approach was – although numerically slightly worse – comparable with the semiautomated approaches. However, in contrast to the semi-automated approach 2 out of 11 lesions were not correctly identified by the automated approach.1 It seems interesting to investigate these two cases further as they may present features that are distinct. In subject 9 (Fig. 6), the lesion is rather inhomogeneous and is also not easily delineated in the semi-automated approaches. Here, while missing the lesion, the automated approach correctly identifies the enlarged ventricle as abnormal, which corresponds to an adjacent, indirect lesion effect. Interestingly, this was also detected as abnormal in the semiautomated approach and could not be separated from the main lesion anymore in the approach using 8 mm smoothing (as both abnormal 1 To be fair, it should be noted that standard options were used and no attempt was made to investigate possible optimizations for the automated approach as we felt this would be beyond the scope of this study.
clusters are connected; white arrow in Fig. 6). This underlines the sensitivity (or, in this case, vulnerability) of both the semi- and the fully automated approach to indirect lesion effects which may be more prominent than the lesion itself. Hence, manual interaction, or, at least, visual assessment is required in each case in order to validate the automatically generated results. This was not done here but would have to be taken into account when comparing this with other approaches. The lesion in subject 11 (the smallest lesion in this group) was also not identified automatically (Fig. 7), again demonstrating that smaller lesions are less amenable to automated assessment.
Advantages and disadvantages of semi- and fully automated approaches With neural plasticity now being observable in gray (Draganski et al., 2004) and white matter (Scholz et al., 2009) in response to a few months of training, it has become clear that functional changes are accompanied by structural changes that are detectable using advanced neuroimaging methods. While these indirect effects may appear to be a nuisance factor (making the delineation of the direct lesion effect more difficult), they may in fact be the very focus of a study investigating the remote effects a lesion has on the neural layout of the brain (Juenger et al., 2007; Riecker et al., 2010; Staudt et al., 2002; Wilke et al., 2009). Therefore, the focus of the investigation itself may suggest different methodological approaches as only direct
M. Wilke et al. / NeuroImage 56 (2011) 2038–2046
2045
Fig. 8. Semi-automated approach, output in a single subject (P2): Manual lesion delineation (A, red) versus the final combined lesion mask (B, blue) and the contributing four feature maps: tissue homogeneity (C, purple), tissue composition (D, green), laterality (E, yellow), and shape (F, cyan). Note already rather specific lesion description in C and D, bilateral detection in E, and rather coarse detection in F. The individual z-thresholds used were 5, 5, 4, and 3, respectively.
lesion effects are easily amenable to manual tracing. While indirect adjacent effects may be traced manually, indirect remote effects must be considered to be too small to be traceable in the individual subject. In the future, approaches automatically assessing the whole brain may therefore be used to detect and describe such effects. When comparing the semi- and the fully automated approach, the semi-automatic approach was able to isolate all direct lesion effects. Additionally, this approach also appeared sensitive to adjacent indirect lesion effects: abnormally enlarged ventricles were almost always detected and could (when using broader smoothing filters) not be separated anymore from the main lesion. Interestingly, these indirect effects were so predominant in one case that they were identified as the primary lesion by the fully automated approach, underlining a potential vulnerability of unattended processing. However, this also points towards an opportunity as detecting and differentiating direct and indirect lesion effects is of potentially great interest for understanding the nature of neuronal reorganization in the chronic phase of a brain lesion. For this aspect, a purely manual approach alone is not wellsuited. On the other end of the spectrum, the exact delineation of the direct effect of a small lesion is still the domain of manual delineation. Possible limitations of this study While more suitable for lesioned brains, it should be noted that the priorless segmentation approach (Gaser et al., 2007) requires highquality, high-contrast imaging data and may be more susceptible to image artifacts as the “stabilizing” influence of tissue priors is lacking. The image quality of the input data must therefore be checked carefully. Also, the question on how applicable the current algorithm would be in the acute setting was not investigated here. In the current implementation, it works well for chronic brain lesions, which will overwhelmingly be T1-hypointense (Seghier et al., 2008). However, the performance of this approach in the presence of only mild or no T1-hypointensity has not been tested. Although the number of subjects was small, the sample size seems sufficient to detect the general trends in the differences between the approaches explored in this work. It should also be noted that, owing to sample size considerations, no formal assessment of the interrater reliability in the manual or semi-automated approach was attempted here. Finally, we used standard adult instead of custom-made pediatric reference data (Wilke et al., 2008) for the current, mixed population of children, young and middle-aged adults; when assessing a more homogenous population, such more appropriate, high-resolution reference data may further increase the specificity and sensitivity of semi- and fully automated approaches.
Conclusions Manual tracing methods remain the gold standard for the exact delineation of direct lesion effects, but they are also time-consuming. For the delineation of a larger lesion, or for finding indirect lesion effects (adjacent or remote), semi- or fully automated approaches may be preferable. Not only are they much faster (thus effectively making largescale studies possible), but they also detect abnormalities that are either unconnected with the direct lesion or are too subtle to be picked up by visual inspection. Currently, expert validation of the results seems necessary in either case. Acknowledgments We would like to thank the participants for their time and willingness to contribute to this study. We are also grateful to Mohamed Seghier, Wellcome Trust Centre for Neuroimaging, UC London, for supplying the ALI toolbox, and to Martin Staudt, Epilepsy Center Vogtareuth, for helpful discussions. This work has been supported by the Deutsche Forschungsgemeinschaft DFG (WI3630/1-1, to MW, and KA1258/10-1, to HOK) as well as the Bundesministerium für Bildung und Forschung (BMBF-Verbund 01GW0641 “Räumliche Orientierung”). References Achiron, A., Gicquel, S., Miron, S., Faibel, M., 2002. Brain MRI lesion load quantification in multiple sclerosis: a comparison between automated multispectral and semiautomated thresholding computer-assisted techniques. Magn. Reson. Imaging 20, 713–720. Ashburner, J., 2007. A fast diffeomorphic image registration algorithm. Neuroimage 38, 95–113. Ashburner, K., Friston, K.J., 1997. Multimodal image coregistration and partitioning — a unified framework. Neuroimage 6, 209–217. Ashburner, J., Friston, K.J., 1999. Nonlinear spatial normalization using basis functions. Hum. Brain Mapp. 7, 254–266. Ashburner, J., Friston, K.J., 2000. Voxel-based morphometry — the methods. Neuroimage 11, 805–821. Ashburner, J., Friston, K.J., 2005. Unified segmentation. Neuroimage 26, 839–851. Ashton, E.A., Takahashi, C., Berg, M.J., Goodman, A., Totterman, S., Ekholm, S., 2003. Accuracy and reproducibility of manual and semiautomated quantification of MS lesions by MRI. J. Magn. Reson. Imaging 17, 300–308. Banati, R.B., 2002. Brain plasticity and microglia: is transsynaptic glial activation in the thalamus after limb denervation linked to cortical plasticity and central sensitisation? J. Physiol. Paris 96, 289–299. Barkhof, F., Scheltens, P., 2002. Imaging of white matter lesions. Cerebrovasc. Dis. 13 (Suppl 2), 21–30. Bates, E., Wilson, S.M., Saygin, A.P., Dick, F., Sereno, M.I., Knight, R.T., Dronkers, N.F., 2003. Voxel-based lesion-symptom mapping. Nat. Neurosci. 6, 448–450. Borovsky, A., Saygin, A.P., Bates, E., Dronkers, N., 2007. Lesion correlates of conversational speech production deficits. Neuropsychologia 45, 2525–2533.
2046
M. Wilke et al. / NeuroImage 56 (2011) 2038–2046
Brett, M., Leff, A.P., Rorden, C., Ashburner, J., 2001. Spatial normalization of brain images with focal lesions using cost function masking. Neuroimage 14, 486–500. Bruggemann, J.M., Wilke, M., Som, S.S., Bye, A.M., Bleasel, A., Lawson, J.A., 2007. Voxelbased morphometry in the detection of dysplasia and neoplasia in childhood epilepsy: combined grey/white matter analysis augments detection. Epilepsy Res. 77, 93–101. Crinion, J., Ashburner, J., Leff, A., Brett, M., Price, C., Friston, K., 2007. Spatial normalization of lesioned brains: performance evaluation and impact on fMRI analyses. Neuroimage 37, 866–875. Cuadra, M.B., Cammoun, L., Butz, T., Cuisenaire, O., Thiran, J.P., 2005. Comparison and validation of tissue modelization and statistical classification methods in T1weighted MR brain images. IEEE Trans. Med. Imaging 24, 1548–1565. Davison, A.C., Hinkley, D.V., Young, G.A., 2003. Recent developments in bootstrap methodology. Stat. Sci. 18, 141–157. Desmurget, M., Bonnetblanc, F., Duffau, H., 2007. Contrasting acute and slow-growing lesions: a new door to brain plasticity. Brain 130, 898–914. Dice, L.R., 1945. Measures of the amount of ecologic association between species. Ecology 26, 297–302. Draganski, B., Gaser, C., Busch, V., Schuierer, G., Bogdahn, U., May, A., 2004. Neuroplasticity: changes in grey matter induced by training. Nature 427, 311–312. Dyet, L.E., Kennea, N., Counsell, S.J., Maalouf, E.F., Ajayi-Obe, M., Duggan, P.J., Harrison, M., Allsop, J.M., Hajnal, J., Herlihy, A.H., Edwards, B., Laroche, S., Cowan, F.M., Rutherford, M.A., Edwards, A.D., 2006. Natural history of brain lesions in extremely preterm infants studied with serial magnetic resonance imaging from birth and neurodevelopmental assessment. Pediatrics 118, 536–548. Elsheikh, T.M., Kirkpatrick, J.L., Cooper, M.K., Johnson, M.L., Hawkins, A.P., Renshaw, A.A., 2010. Increasing cytotechnologist workload above 100 slides per day using the ThinPrep imaging system leads to significant reductions in screening accuracy. Cancer Cytopathol. 118, 75–82. Fellows, L.K., Heberlein, A.S., Morales, D.A., Shivde, G., Waller, S., Wu, D.H., 2005. Method matters: an empirical study of impact in cognitive neuroscience. J. Cogn. Neurosci. 17, 850–858. Fiez, J.A., Damasio, H., Grabowski, T.J., 2000. Lesion segmentation and manual warping to a reference brain: intra- and interobserver reliability. Hum. Brain Mapp. 9, 192–211. Filippi, M., Horsfield, M.A., Bressi, S., Martinelli, V., Baratti, C., Reganati, P., Campi, A., Miller, D.H., Comi, G., 1995. Intra- and inter-observer agreement of brain MRI lesion volume measurements in multiple sclerosis. A comparison of techniques. Brain 118, 1593–1600. Frederix, G., Pauwels, E.J., 1999. Automatic interpretation based on robust segmentation and shape-extraction. Lect. Notes Comp. Sci. 1614, 773–780. Friston, K.J., Rotshtein, P., Geng, J.J., Sterzer, P., Henson, R.N., 2006. A critique of functional localisers. Neuroimage 30, 1077–1087. Gaser, C., Nenadic, I., Buchsbaum, B.R., Hazlett, E.A., Buchsbaum, M.S., 2001. Deformation-based morphometry and its relation to conventional volumetry of brain lateral ventricles in MRI. Neuroimage 13, 1140–1145. Gaser, C., Altaye, M., Wilke, M., Holland, S.K., 2007. Unified segmentation without tissue priors. Neuroimage 36 (Suppl. 1), S68. Gaser, C., 2010. VBM5-toolbox. Available at http://dbm.neuro.uni-jena.de 2010. Henselmans, J.M., de Jong, B.M., Pruim, J., Staal, M.J., Rutgers, A.W., Haaxma, R., 2000. Acute effects of thalamotomy and pallidotomy on regional cerebral metabolism, evaluated by PET. Clin. Neurol. Neurosurg. 102, 84–90. Jones, D.K., Symms, M.R., Cercignani, M., Howard, R.J., 2005. The effect of filter size on VBM analyses of DT-MRI data. Neuroimage 26, 546–554. Juenger, H., Linder-Lucht, M., Walther, M., Berweck, S., Mall, V., Staudt, M., 2007. Cortical neuromodulation by constraint-induced movement therapy in congenital hemiparesis: an FMRI study. Neuropediatrics 38, 130–136. Karnath, H.-O., Perenin, M.-T., 2005. Cortical control of visually guided reaching: evidence from patients with optic ataxia. Cereb. Cortex 15, 1561–1569. Knottnerus, A., van Weel, C., Muris, J.W.M., 2002. Evaluation of diagnostic procedures. BMJ 324, 477–480.
Kuhnke, N., Juenger, H., Walther, M., Berweck, S., Mall, V., Staudt, M., 2008. Do patients with congenital hemiparesis and ipsilateral corticospinal projections respond differently to constraint-induced movement therapy? Dev. Med. Child Neurol. 50, 898–903. Lim, C.C., 2009. Magnetic resonance imaging findings in bilateral basal ganglia lesions. Ann. Acad. Med. Singapore 38, 795–798. Luders, E., Gaser, C., Jancke, L., Schlaug, G., 2004. A voxel-based approach to gray matter asymmetries. Neuroimage 22, 656–664. Micceri, T., 1989. The unicorn, the normal curve, and other improbable creatures. Psychol. Bull. 105, 156–166. Moro, V., Urgesi, C., Pernigo, S., Lanteri, P., Pazzaglia, M., Aglioti, S.M., 2008. The neural basis of body form and body action agnosia. Neuron 60, 235–246. Mort, D.J., Malhotra, P., Mannan, S.K., Rorden, C., Pambakian, A., Kennard, C., Husain, M., 2003. The anatomy of visual nelgect. Brain 126, 1986–1997. Rademacher, J., Caviness, V.S., Steinmetz, H., Galaburda, A.M., 1993. Topographical variation of the human primary cortices: implications for neuroimaging, brain mapping, and neurobiology. Cereb. Cortex 3, 313–329. Riecker, A., Gröschel, K., Ackermann, H., Schnaudigel, S., Kassubek, J., Kastrup, A., 2010. The role of the unaffected hemisphere in motor recovery after stroke. Hum. Brain Mapp. 31, 1017–1029. Rorden, C., Karnath, H.O., 2004. Using human brain lesions to infer function: a relic from a past era in the fMRI age? Nat. Rev. Neurosci. 5, 813–819. Rorden, C., Karnath, H.-O., Bonilha, L., 2007. Improving lesion-symptom mapping. J. Cog. Neurosci. 19, 1081–1088. Rorden, C., Fridriksson, J., Karnath, H.O., 2009. An evaluation of traditional and novel tools for lesion behavior mapping. Neuroimage 44, 1355–1362. Scholz, J., Klein, M.C., Behrens, T.E., Johansen-Berg, H., 2009. Training induces changes in white-matter architecture. Nat. Neurosci. 12, 1370–1371. Seghier, M.L., Ramlackhansingh, A., Crinion, J., Leff, A.P., Price, C.J., 2008. Lesion identification using unified segmentation-normalisation models and fuzzy clustering. Neuroimage 41, 1253–1266. Staudt, M., 2010. Brain plasticity following early life brain injury: insights from neuroimaging. Semin. Perinatol. 34, 87–92. Staudt, M., Lidzba, K., Grodd, W., Wildgruber, D., Erb, M., Krägeloh-Mann, I., 2002. Righthemispheric organization of language following early left-sided brain lesions: functional MRI topography. Neuroimage 16, 954–956. Volkau, I., Prakash, B., Ananthasubramaniam, A., Gupta, V., Aziz, A., Nowinski, W.L., 2006. Quantitative analysis of brain asymmetry by using the divergence measure: normal-pathological brain discrimination. Acad. Radiol. 13, 752–758. Walther, M., Juenger, H., Kuhnke, N., Wilke, M., Brodbeck, V., Berweck, S., Staudt, M., Mall, V., 2009. Motor cortex plasticity in ischemic perinatal stroke: a transcranial magnetic stimulation and functional MRI study. Pediatr. Neurol. 41, 171–178. Wilke, M., Kassubek, J., Ziyeh, S., Schulze-Bonhage, A., Huppertz, H.J., 2003. Automated detection of gray matter malformations using optimized voxel-based morphometry: a systematic approach. Neuroimage 20, 330–343. Wilke, M., Schmithorst, V.J., 2006. A combined bootstrap/histogram analysis approach for computing a lateralization index from neuroimaging data. Neuroimage 33, 522–530. Wilke, M., Lidzba, K., 2007. LI-tool: a new toolbox to assess lateralization in functional MR-data. J. Neurosci. Methods 163, 128–136. Wilke, M., Holland, S.K., Altaye, M., Gaser, C., 2008. Template-O-Matic: a toolbox for creating customized pediatric templates. Neuroimage 41, 903–913. Wilke, M., Staudt, M., Juenger, H., Grodd, W., Braun, C., Krägeloh-Mann, I., 2009. Somatosensory system in two types of motor reorganization in congenital hemiparesis: topography and function. Hum. Brain Mapp. 30, 776–788. Zou, K.H., Warfield, S.K., Bharatha, A., Tempany, C.M.C., Kaus, M.R., Haker, S.J., Wells, W.M., Jolesz, F.A., Kikinis, R., 2004. Statistical validation of image segmentation quality based on a spatial overlap index: scientific reports. Acad. Radiol. 11, 178–189.