Spatial decision forests for MS lesion segmentation in multi-channel magnetic resonance images

NeuroImage 57 (2011) 378–390 Contents lists available at ScienceDirect NeuroImage j o u r n a l h o m e p a g e : w w w. e l s e v i e r. c o m / l ...

Download PDF

2MB Sizes 0 Downloads 10 Views

Report

PDF Reader
Full Text

NeuroImage 57 (2011) 378–390

Contents lists available at ScienceDirect

NeuroImage j o u r n a l h o m e p a g e : w w w. e l s e v i e r. c o m / l o c a t e / y n i m g

Spatial decision forests for MS lesion segmentation in multi-channel magnetic resonance images Ezequiel Geremia a,c,⁎, Olivier Clatz a, Bjoern H. Menze a,b, Ender Konukoglu c, Antonio Criminisi c, Nicholas Ayache a a b c

Asclepios Research Project, INRIA Sophia-Antipolis, France Computer Science and Artiﬁcial Intelligence Laboratory, MIT, USA Machine Learning and Perception Group, Microsoft Research Cambridge, UK

a r t i c l e

i n f o

Article history: Received 19 November 2010 Revised 24 March 2011 Accepted 29 March 2011 Available online 8 April 2011 Keywords: Multi-sequence MRI Segmentation Multiple Sclerosis Random forests MICCAI Grand Challenge 2008

a b s t r a c t A new algorithm is presented for the automatic segmentation of Multiple Sclerosis (MS) lesions in 3D Magnetic Resonance (MR) images. It builds on a discriminative random decision forest framework to provide a voxel-wise probabilistic classiﬁcation of the volume. The method uses multi-channel MR intensities (T1, T2, and FLAIR), knowledge on tissue classes and long-range spatial context to discriminate lesions from background. A symmetry feature is introduced accounting for the fact that some MS lesions tend to develop in an asymmetric way. Quantitative evaluation of the proposed methods is carried out on publicly available labeled cases from the MICCAI MS Lesion Segmentation Challenge 2008 dataset. When tested on the same data, the presented method compares favorably to all earlier methods. In an a posteriori analysis, we show how selected features during classiﬁcation can be ranked according to their discriminative power and reveal the most important ones. © 2011 Elsevier Inc. All rights reserved.

Introduction Multiple Sclerosis (MS) is a chronic, inﬂammatory and demyelinating disease that primarily affects the white matter of the central nervous system. Automatic detection and segmentation of MS lesions can help diagnosis and patient follow-up. It offers an attractive alternative to manual segmentation which remains a time-consuming task and suffers from intra- and inter-expert variability. MS lesions, however, show a high variability in appearance and shape which makes automatic segmentation a challenging task. MS lesions lack common intensity and texture characteristics, their shapes are variable and their location within the white matter varies across patients. A variety of methods have been proposed for the automatic segmentation of MS lesions. For instance, in Anbeek et al.(March, 2004) and Admiraal-Behloul et al.(2005), the authors propose to segment white matter signal abnormalities by using an intensitybased k-nearest neighbors method with spatial prior and a fuzzy inference system, respectively. A similar classiﬁer combined with a template-driven segmentation was proposed in Wu et al.(2006) to segment MS lesions into three different subtypes (enhancing lesions, T1 black holes, and T2 hyperintense lesions). A false positive reduction based on a rule-based method, a level set method and a support vector

⁎ Corresponding author at: Asclepios Research Project, INRIA Sophia-Antipolis, France. E-mail address: [email protected] (E. Geremia). 1053-8119/$ – see front matter © 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.neuroimage.2011.03.080

machine classiﬁer is presented in Yamamoto et al.(2010) along with a multiple-gray level thresholding technique. Many general purpose brain tissue and brain tumor segmentation approaches can be modiﬁed easily for MS lesion segmentation. In Bricq et al.(2008b), for example, the authors present an unsupervised algorithm based on hidden Markov chains for brain tissue segmentation in MR sequences. The method provides an estimation of the proportion of white matter (WM), gray matter (GM) and cerebro-spinal ﬂuid (CSF) in each voxel. It can be extended for MS lesions segmentation by adding an outlier detector (Bricq et al., 2008a). Generative methods were proposed consisting in a tissue classiﬁcation by means of an expectation maximization (EM) algorithm. For instance, the method presented in Datta et al.(2006) aims at segmenting and quantifying black holes among MS lesions. The EM algorithm can be modiﬁed to be robust against lesion affected regions, its outcome is then parsed in order to detect outliers which, in this case, coincide with MS lesions (Van Leemput et al., 2001). Another approach consists in adding to the EM a partial volume model between tissue classes and combining it with a Mahalanobis thresholding which highlights the lesions (Dugas-Phocion et al., 2004). Morphological postprocessing on resulting regions of interest was shown to improve the classiﬁcation performance (Souplet et al., 2008). In Freifeld et al.(2009), a constrained Gaussian mixture model is proposed, with no spatial prior, to capture the tissue spatial layout. MS lesions are detected as outliers and then grouped in an additional tissue class. Final delineation is performed using probability-based curve evolution. Multi-scale segmentation can be combined with

E. Geremia et al. / NeuroImage 57 (2011) 378–390

discriminative classiﬁcation to take into account regional properties (Akselrod-Ballin et al., 2006). Beyond the information introduced via the spatial prior atlases, these methods are limited in their ability to take advantage of long-range spatial context in the classiﬁcation task. To overcome this shortcoming, we propose the use of an ensemble of discriminative classiﬁers. Our algorithm builds on the random decision forest framework which has multiple applications in bioinformatics (Menze et al., 2009), and, for example, more recently also in the image processing community (Andres et al., 2008; Yi et al., 2009; Criminisi et al., 2010). Adding spatial and multi-channel features to this classiﬁer proved effective in object recognition (Shotton et al., 2009), brain tissue segmentation in MR images (Yi et al., 2009), myocardium delineation in 3D echocardiography (Lempitsky et al., 2009) and organ localization in CT volumes (Criminisi et al., 2010). Applying multi-channel and context-rich random forest classiﬁcation to the MS lesion segmentation problem is novel, to our knowledge. The presented classiﬁer also exploits a speciﬁc discriminative symmetry feature which stems from the assumption that the healthy brain is approximately symmetric with respect to the midsagittal plane and that MS lesions tend to develop in asymmetric ways. We then show how the forest combines the most discriminative channels for the task of MS lesion segmentation. Materials This section describes the data, algorithms and notations which are referred to in the rest of the article. MICCAI Grand Challenge 2008 dataset The results in this article rely on a strong evaluation effort. This section presents the MICCAI1 Grand Challenge 2008 datasets, which is the largest dataset publicly available, and explains the way our method is compared against the winner of the challenge (Souplet et al., 2008). In the rest of the article, the MICCAI Grand Challenge 2008 on MS Lesions Segmentations will be referred to as MSGC. Presentation The MSGC (Styner et al., 2008a) aims at evaluating and comparing algorithms in an independent and standardized way for the task of MS lesion segmentation. The organizers make publicly available two datasets through their website. A dataset of labeled MR images which can be used to train a segmentation algorithm, and an unlabeled dataset on which the algorithm should be tested. The website offers to quantitatively evaluate the segmentation results on the unlabeled dataset using the associated private ground truth database, and to publish the resulting scores. This project is an original initiative to provide an unbiased comparison between MS lesions segmentation algorithms. In the rest of the article, the dataset for which labels are publicly available will be referred to as public dataset, whereas the dataset for which data is not available will be referred to as private dataset.

co-registered and sampled to ﬁt the isotropic 0.5 × 0.5 × 0.5 mm3 resolution. Both private and public datasets gather anatomical images from two different centers, CHB and UNC, and shows high variability in intensity contrast (cf. Inﬂuence of preprocessing section), image noise and bias ﬁeld. Both public and private datasets contain highly heterogeneous cases and could thus be considered as realistic test cases.

Evaluation Quantitative evaluation is carried out on the private dataset using a set of known metrics deﬁned in Styner et al.(2008a) and summed up in Table 1. The two full sets of expert segmentations were used as reference for method comparison.

Top-ranked methods The challenge results highlight four top-ranked methods each reﬂecting a different approach to the task of MS lesion segmentation. A k-nearest neighbor classiﬁcation of brain tissue relying on spatial location and intensity value was proposed in Anbeek et al.(2008). This method provides a voxel-wise probabilistic classiﬁcation of MS lesions. In Bricq et al.(2008a), the authors present an unsupervised segmentation algorithm based on a hidden Markov chain model. The method takes into account neighborhood information, MR sequences and probabilistic priors in order to delineate MS lesions. Alternatively, the iterative method proposed in Shiee et al.(2008)and Shiee et al. (2010) jointly performs brain tissue classiﬁcation and MS lesion segmentation by combining statistical and topological atlases. Finally, in Souplet et al.(2008), the authors show that a global threshold on the FLAIR MR sequence, inferred using an EM brain tissue classiﬁcation, sufﬁces to detect most MS lesions. The ﬁnal segmentation is then constrained to appear in the white matter by applying morphological operations. The method proposed in Souplet et al.(2008) won the MICCAI MS Segmentation Challenge 2008. For this speciﬁc method, the segmentation results on public and private datasets were made available by

Table 1 The evaluation metrics true negative rate (TNR), true positive rate (TPR), false positive rate (FPR) and positive predictive value (PPV) are deﬁned using the following notations: true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN). The volume overlap (VO) and the relative absolute volume difference (VD) evaluates the differences between the segmentation (Seg) and the ground truth (GT) by computing their volume (Vol). The average symmetric surface distance (SD) measures how close the segmentation and the ground truth are from each other using the Euclidean distance d on the set of boundary voxels noted by ∂. The best, respectively worse, column contains the metric score of the perfect segmentation, respectively of a completely off segmentation. Name

Deﬁnition

Unit

Best

Worse

TNR

TN FP + TN TP TP + FN

%

100

0

%

100

0

FPR

FP FP + TN

%

0

100

PPV

TP TP + FP

%

100

0

VO

VolðSeg∩GT Þ VolðSeg∪GT Þ

%

0

100

VD

VolðSeg Þ−VolðGT Þ VolðGT Þ

%

0

b∞

mm

0

b∞

TPR

Data The public dataset contains 20 cases, 10 from the Children's Hospital in Boston (CHB) and 10 from the University of North Carolina (UNC), which are labeled by a CHB expert rater. The private dataset contains 25 cases, 15 from CHB and 10 from UNC. The private dataset was annotated by a single expert rater at CHB and jointly by 2 expert raters at UNC. For each case, the centers provided 3 MR volumes: a T1weighted image, a T2-weighted image and a FLAIR image. These were

∑

1

MICCAI is the annual international conference on Medical Image Computing and Computer Assisted Intervention.

379

SD

min dðu; vÞ +

v∈∂ðSeg Þ u∈∂ðGT Þ

∑

min dðu; vÞ

v∈∂ðGT Þ u∈∂ðSeg Þ cardðSeg∪GT Þ

380

E. Geremia et al. / NeuroImage 57 (2011) 378–390

the authors and will be used as reference. In the rest of the article, methods will be identiﬁed by their reference.

Data preprocessing We sub-sample and crop the images so that they all have the same size, 159 × 207 × 79 voxels, and the same resolution, 1 × 1 × 2 mm3. Sub-sampling and cropping intend to reduce the time spent learning the classiﬁer. The preprocessing procedure corrects for RF acquisition ﬁeld inhomogeneities (Prima et al., 2001) and performs inter-subject intensity calibration (Rey, 2002). Spatial normalization is also performed by aligning the mid-sagittal plane with the center of the images (Prima et al., 2002). Spatial prior is added by registering the MNI atlas (Evans et al., 1993) to the anatomical images, each voxel of the atlas providing the probability of belonging to WM, GM and CSF (cf. Fig. 1). The Image Fusion module of MedINRIA (INRIA, 2010) is used to perform the afﬁne registration of the MNI atlas onto every single case.

Notations The multi-channel aspect of the method presented in this article requires to carefully deﬁne and name each channel. MR images from the MSGC dataset will be noted by Is where the index s ∈ {T1, T2, FLAIR} stands for an MR sequence. Registered spatial priors will be noted by Pt where the index t ∈ {WM, GM, CSF} stands for a brain tissue class. Although having different semantics, anatomical images and spatial priors will be treated under the uniﬁed term signal channel and denoted by C ∈ {IT1, IT2, IFLAIR, PWM, PGM, PCSF}. The data consists of a collection of voxel samples described by their spatial position x = (x, y, z). Voxels can be evaluated in all available signal channels. The value of the voxel x in channel C is denoted by C(x).

Methods This section describes our adaptation of the random decision forests to the segmentation of MS lesions and illustrates the visual features employed. Context-rich decision forest Our detection and segmentation problem can be formalized as a binary classiﬁcation of voxel samples into either background or lesions. This classiﬁcation problem is addressed by a supervised method: discriminative random decision forest, an ensemble classiﬁer using decision trees as base classiﬁers. Decision trees are discriminative classiﬁers which are known to suffer from over-ﬁtting (Breiman et al., 1984). A random decision forest (Amit and Geman, 1997) achieves better generalization by growing an ensemble of many independent decision trees on a random subset of the training data and by randomizing the features made available to each node during training (Breiman, 2001). Forest training The training data consists of a set of labeled voxels T = fxk ;Yðxk Þg where the label Y(xk) is given by an expert. When asked to classify a new image, the classiﬁer aims to assign every voxel x in the volume a label y(x). In our case, y(x)∈{0,1}, 1 for lesion and 0 for background. The forest has T components with t indexing each tree. During training, all observations (voxels) xk are pushed through each of the trees. Each internal node applies a binary test (Shotton et al., 2009; Yi et al., 2009; Lempitsky et al., 2009; Criminisi et al., 2010) as follows: t

τlow ;τup ;θ

ðxk Þ =

true; if τlow ≤θðxk Þbτup false; otherwise

ð1Þ

where θ is a function identifying the visual feature extracted at position xk. There are several ways of deﬁning θ, either as a local

Fig. 1. Case CHB07 from the public MSGC dataset. From top to bottom: three axial slices of the same patient. From left to right: preprocessed T1-weighted (IT1), T2-weighted (IT2) and FLAIR MR images (IFLAIR), the associated ground truth GT and the registered white matter atlas (PWM).

E. Geremia et al. / NeuroImage 57 (2011) 378–390

intensity-based average, local spatial prior or context-rich cue. These are investigated in more detail in the next section. The value of the extracted visual feature is thresholded by τlow and τup. The voxel xk is then sent to one of the two child nodes based on the outcome of this test. During training, each node p is optimized using the partition of the training data T p it receives as input. At the end of the training process, * each node p is assigned to the optimal binary test tλp, where λ*p = (τ*low, τ*up, θ*)p. The optimality criterion is the information gain, denoted by IG, as deﬁned in Quinlan(1993) λ IG λ; Tp = H Tp −H Tp j t ðxk Þ k

ð2Þ

where Tp ⊂ T and where H denotes the entropy. More precisely, the term H T p j t λ ðxk Þ k measures the error made when approximating the expert labeling Y by the binary test t λ. The optimal parameter λ*p maximizes the information gain λp = arg max IG λ; Tp λ

ð3Þ

for node p. As a result, the optimal binary test is the test discriminating lesion from background voxels such as maximizing the information gain. Only a randomly sampled subset Θ of the feature space is available at each node for optimization, while the threshold space is uniformly discretized. The optimal λ* = (τ*low, τ*up, θ*) is found by exhaustive search jointly over the feature and threshold space. Random sampling of the features leads to increased inter-node and inter-tree variability which improves generalization (Breiman, 2001). Trees are grown to a maximum depth D. At the node level, a leaf node is generated when the information gain is below a minimal value IGmin. As a result of the training process, each leaf node lt of every tree t receives a partition Tlt of the training data. The partition Tlt can be

381

divided into two sets respectively containing background and lesion voxels and deﬁned as n o b T lt = ðx; Y ðxÞÞ∈Tlt jY ðxÞ = b

ð4Þ

where b ∈ {0,1} stands for the background and lesion class, respectively. Subsequently, the following empirical posterior probability is deﬁned plt ðY ðxÞ = bÞ =

jT blt j

ð5Þ

jTlt j

and stored at the leaf node. Fig. 2 illustrates how the decision trees partition the data in the feature space and how resulting probabilities are stored in leaf nodes. Prediction When applied to a new test volume Ttest = fxk g, each voxel xk is propagated through all the trees by successive application of the relevant binary tests. When reaching the leaf node lt in all trees t ∈ [1 … T], posteriors plt(Y(x) = b) are gathered in order to compute the ﬁnal posterior probability deﬁned as follows: pð yðxÞ = bÞ =

1 T ∑ p ðY ðxÞ = bÞ T t = 1 lt

ð6Þ

which is a mean over all the trees in the forest. This probability may be thresholded at a ﬁxed value τposterior if a binary segmentation is required. A posterior map Pb is obtained by applying the same prediction procedure to all voxels. Thus for every voxel x, Pb(x) = p(y(x) = b) is the posterior probability of belonging to the class b. This probability map can be thresholded at a ﬁxed value to obtain a segmentation. Choosing τposterior = 0.5 is equivalent to looking for b = arg max Pb ðxÞ. b

Fig. 2. Decision trees encode feature space partitions. (a) A decision tree of depth D = 2 is considered in this example. Decision node 1 and leaf nodes 5 and 6 are colored to track the partitions of the training data in the feature space. The black cross stands for an unseen sample (voxel) which is classiﬁed while propagated down the tree. (b) A zoom on node 2 shows that its binary test, denoted by tτlow, 2, τup, 2, θ2, is optimized over a partition of the training data, denoted by T2 = {vk, 2, Y(vk, 2)}. The leaf node 6 encloses the class distribution of the set of voxels reaching it during training. The classes are background (blue circles) and lesion (red triangles). (c) The dots stand for the training voxels and are colored according to their class. The black cross denotes a voxel from an unseen volume considered for prediction. Every decision node in the forest applies an axis-aligned feature test. Here we focus on decision nodes 0 and 2 using features θ0 and θ2, respectively.

382

E. Geremia et al. / NeuroImage 57 (2011) 378–390

Advantages The probabilistic random forest framework presented in this section shows considerable advantages over other classiﬁers, e.g. support vector machines (SVMs). Indeed, it combines efﬁcient probabilistic classiﬁcation and transparent feature selection as detailed below. When a new case is presented to the classiﬁer, every voxel goes through a sequence of decisions on different channels. As a result, the posterior probability affected to each voxel measures the conﬁdence of this voxel being MS lesions in the multi-channel space inferred from training. Trees from the same forest are all independent from one another. The posterior map can thus be computed in parallel: each tree computes its own posterior map, they are then combined to form the ﬁnal result (cf. Prediction section).The training of the random forest can also be parallelized in a trivial manner by learning each tree independently from the others. Moreover, unlike more “black-box” supervised methods such as SVMs or neural networks, the random forest framework enables us to enter the learned trees and identify the most discriminative features. In the Analysis of feature relevance section, we take advantage of this property to draw a detailed analysis of the most discriminative visual features for the task of MS segmentation. The following properties motivate the use of random forests framework for the task of MS lesions segmentation: 1) when applying the random forest the binary trees can be evaluated extremely fast in prediction; 2) thanks to parallelism training on MR volumes is fast; and 3) as a result of the training process, an optimal sequence of decisions, including most informative channels and visual features, is effortlessly available (cf. Most discriminative channels section). In many applications, random forests have been found to generalize better than SVM or boosting (Yin et al., 2010). The generalization power increases monotonically with increasing the forest size. Unlike SVM and boosting, random forests also estimate the conﬁdence of the prediction as a by-product of the training process. Fig. 3 shows the feature space of an exemplary segmentation problem. It illustrates that the random forest also generalizes well in regions of the feature space with sparse data support, which is beneﬁcial. The uncertainty increases in those areas in feature space which have little data support and on the boundary between classes. This is an important expected behavior, similar to that of Gaussian Processes (Bishop, 2006). Another interesting property of random forests lies in the way they separate the feature space. In Fig. 3, probability maps show the highest uncertainty values on voxels equidistant of the two classes.

This is a feature the random forest classiﬁer shares with maximummargin classiﬁer such as SVMs (Bernhard Schölkopf and Smola, 1999). Visual features This section aims at presenting the visual features and at arguing the underlying motivation. Two kinds of visual features are computed: 1. local features:

loc

θC ðxÞ = C ðxÞ

ð7Þ

where C is an intensity or a prior channel, and C(x) is the value of channel C at position x; 2. context-rich features comparing the voxel of interest with distant regions. The ﬁrst context-rich feature compares the local voxel value in channel C1 with the mean value in channel C2 over two 3D boxes R1 and R2 within an extended neighborhood: cont

θC1 ;C2 ;R1 ;R2 ðxÞ = C1 ðxÞ−

1 1 ∑ C ðx′Þ− ∑ C ðx′Þ VolðR1 Þ x′ ∈R1 2 VolðR2 Þ x′ ∈R2 2

ð8Þ

where C1 and C2 are both intensity or prior channels. The regions R1 and R2 are sampled randomly in a large neighborhood of the voxel x (cf. Fig. 4). The sum over these regions is efﬁciently computed using integral volume processing (Shotton et al., 2009). The random sampling of the features is part of the random forest framework and were used in previous work (Criminisi et al., 2009; Criminisi et al., 2010; Yi et al., 2009). The second context-rich feature compares the voxel of interest at x with its symmetric counterpart with respect to the mid-sagittal plane, noted by SðxÞ: sym

θC ðxÞ = C ðxÞ−C ðSðxÞÞ

ð9Þ

where C is an intensity channel. A new version of the symmetry feature loosens the hard symmetric constrain as the size of the neighborhood increases, in order to take into account the fact that the brain is not perfectly

Fig. 3. Posterior maps learned from two distinct synthetic training sets. In both cases, the training data consists of two classes, green and red, and is used to learn a large forest, here T = 350. The posterior map is obtained by classifying a dense grid in the feature space and is then overlayed with the associated training data (shown as points). Larger opacities indicate larger probability of a pixel belonging to a class while uncertain regions are indicated by less saturated colors. The white line plots the locus of points for which p(y(x) = green) = p(y(x) = red). We observe that 1) forest posteriors mimic the maximum-margin behavior, and 2) uncertainty increases when moving away from the training data.

E. Geremia et al. / NeuroImage 57 (2011) 378–390

R1

b

383

S(x)

x

f

R2

g h

c

a

d

e

i

Fig. 4. 2D view of context-rich features. (a) A context-rich feature depicting two regions R1 and R2 with constant offset relatively to x. (b–d) Three examples of randomly sampled features in an extended neighborhood. (e) The symmetric feature with respect to the mid-sagittal plane. (f) The hard symmetric constraint. (g–i) The soft symmetry feature considering neighboring voxels in a sphere of increasing radius. See text for details.

symmetric. Instead of comparing with the exact symmetric SðxÞ of the voxel, the minimal difference value within a close neighborhood of SðxÞ, denoted by S, is chosen. Three different sizes are considered for this neighborhood: 6, 26 and 32 neighbors respectively (cf. Fig. 4). We obtain a softer version of the symmetric feature which reads: sym

θC;S ðxÞ = min fC ðxÞ−C ðx0 Þg: x′ ∈S

ð10Þ

To summarize, three visual features are introduced: local, neighborhood and symmetry features. They can be thought as meta-features to be used on top of standard image ﬁlters, such as local moments, gradients or textures, rather than replacing them. In our case, visual features are evaluated on raw images and spatial priors, but any other channel could be added, e.g. intensity gradient as described in Yi et al.(2009). The presented features can be applied individually to every voxel, unlike e.g. geometric moments. This is an essential property which enables voxel-wise image classiﬁcation. The presented method not only provides a voxel-wise classiﬁcation, but also integrates neighborhood information in the classiﬁcation process thanks to contextrich features. The use of 3D boxes to integrate neighborhood information is motivated by the fact that they are extremely efﬁcient through integral volume processing (Shotton et al., 2009). Experiments and results Results presented in this section aim at evaluating the segmentation results and comparing the context-rich random forest approach to methods presented during the challenge (Souplet et al., 2008; Anbeek et al., 2008; Bricq et al., 2008a; Shiee et al., 2008). Experiments described here are discussed in the Discussion section. Exhaustive segmentation results are available for both public and private datasets under the following url: ftp://ftp-sop.inria.fr/asclepios/Published-Material/Ezequiel.Geremia/. Results on the public MSGC dataset For quantitative evaluation, the 20 available cases from the public dataset are classiﬁed and compared to other methods (Souplet et al., 2008; Anbeek et al., 2008; Bricq et al., 2008a; Shiee et al., 2008). A three-fold cross-validation is carried out on this dataset: the forest is 2 1 trained on of the cases and tested on the other , this operation is 3 3 repeated 3 times in order to collect test errors for each case.

The binary classiﬁcation is evaluated using two measures, true positive rate (TPR) and positive predictive value (PPV), both equal to 1 for perfect segmentation (cf. Table 1). Forest parameters are ﬁxed to the following values: number of random regions |Θ| ≃ 950, number of trees T = 30, tree depth D = 20, lower bound for the information gain IGmin = 10− 5, and posterior threshold τposterior = 0.5. Parameters T and D are set here to maximum values, Inﬂuence of forest parameters section explains how these parameters can be optimized in order to improve segmentation results. Tables in supplemental material report extensive results allowing comparison on every case of the MSGC public dataset. It shows that the learned context-rich random forest achieves better TPR in all cases (cf. top bar plot), and better PPV in 70% of the cases (cf. center bar plot). Computed p-values for the pair-sample t-test show that these improvements are signiﬁcative for both TPR (p = 1.3 ⋅ 10− 7) and PPV (p = 0.0041) scores. Results on the private MSGC dataset A context-rich random forest was learned on the whole public dataset from the MS Lesion Challenge, i.e. 20 labeled cases. Forest parameters are ﬁxed to the following values: number of random regions |Θ| ≃ 950, number of trees T = 30, tree depth D = 20, lower bound for the information gain IGmin = 10− 5, and posterior threshold τposterior = 0.5. Considerations that lead to these parameter values are detailed in Inﬂuence of forest parameters section. The MSCG website carried out a complementary and independent evaluation of our algorithm on the previously unseen private dataset. The results, reported in Table 3, conﬁrm a signiﬁcant improvement over (Souplet et al., 2008). The presented spatial random forest achieves, on average, slightly larger true positive (TPR), which is beneﬁcial (cf. Table 1), and comparable false positive (FPR) rates but lower volume difference (VD) and surface distance (SD) values (cf. Table 3). Pair-sample p-values were computed for the t-test on the private dataset. Results show signiﬁcant improvement over the method presented in Souplet et al.(2008) on SD (p = 4.2 ∙ 10− 6) for the CHB rater, and on SD (p = 6.1 ∙ 10− 3) for the UNC rater. Discussion Interpreting segmentation results Quantitative evaluation of segmentation results, for both public (cf. Table 2) and private (cf. Tables 3–6) datasets, show that the presented random forest framework compares favorably to top-

384

E. Geremia et al. / NeuroImage 57 (2011) 378–390

Table 2 Comparison of context-rich random forests with method presented in Souplet et al. (2008) on the public dataset. Relative improvement over (Souplet et al., 2008), deﬁned as RI = (scoreRF − scoreother)/scoreother, are signiﬁcant for both TPR (p = 1.3 ⋅ 10− 7) and PPV (p = 0.0041) scores. Signiﬁcant improvements over (Souplet et al., 2008) are highlighted in bold. Metric [%]

Souplet et al. (2008)

Context-rich RF

RI [%]

p-value

TPR PPV

19.21 ± 13.68 29.55 ± 16.26

39. 39 ± 18. 40 39. 78 ± 20. 19

105 35

1. 3 ⋅ 10− 7 0. 0041

Table 4 Average results computed by the MSGC on the private dataset and compared to the method presented in Anbeek et al.(2008). The relative mean improvement on the algorithm presented (Anbeek et al., 2008) on the private dataset is deﬁned as follows RI = (scoreRF − scoreAnbeek)/scoreAnbeek. The RI and the p-values are reported on top for each metric of the MSGC, associated p-values are reported below. Independent quantitative evaluation conﬁrms improvement on the algorithm presented in Anbeek et al.(2008). Boldface highlights signiﬁcant improvements. The spatial random forest achieves, on average, better results on UNC than on CHB labels: higher true positive (TPR), which is beneﬁcial and lower false positive (FPR) rates, lower volume difference (VD) and surface distance (SD) values. Rater

Metric [%]

Anbeek et al. (2008)

Context-rich RF

RI [%]

p-value

4*CHB

VD SD TPR FPR VD SD TPR FPR

46.93 ± 50.41 7. 85 ± 11. 00 59.14 ± 21.79 78.51 ± 20.56 100.7 ± 132.3 9. 77 ± 9. 00 48.86 ± 21.64 83.19 ± 18.59

52.94 ± 28.63 5. 27 ± 9. 54 58.08 ± 20.03 70.01 ± 16.32 50.56 ± 41.41 5. 6 ± 6. 67 51.35 ± 19.98 76.81 ± 11.70

+ 12.8 − 32. 8 − 1.80 − 10.8 − 49.8 − 42. 7 + 5.10 − 7.68

0.62 5. 9 ⋅ 10− 3 0.84 8.6 ⋅ 10− 2 9.1 ⋅ 10− 2 3. 6 ⋅ 10− 4 0.38 0.11

ranked methods. More speciﬁcally, they show signiﬁcant improvement on the algorithm presented in Souplet et al.(2008). Exhaustive results are available in the supplemental material allow case-by-case comparison with other methods. The MSGC website (Styner et al., 2008b) gathers the results of the methods presented during the MSGC in 2008 as well as more recent methods which results were submitted directly to the website. The resulting ranking affects a score to each method. The score relates the performance of the method against expected inter-expert variability which is known to be high for MS lesions segmentation. A score of 90 would equal the accuracy of a human rater. The method presented in Bricq et al.(2008a) and our context-rich random forest approach are ﬁrst and second respectively. They show very close scores, 82.1354 and 82.0755 respectively. A score of 82 places the accuracy of the method just below that of a human expert. In addition, we know that the reliability of automatic methods is generally higher than that of human experts. Approaching the performance of a human rater is thus even more interesting. Although segmentation results include most MS lesions delineated by the expert (cf. Figs. 5 and 6), we observe that some MS lesions are missing. Missed MS lesions are located in speciﬁc locations which are not represented in the training data, e.g. in the corpus callosum (cf. Fig. 5, slice 38). This is a limitation of the supervised approach. In this very case, however, the posterior map highlights the missed lesion in the corpus callosum as belonging to the lesion class with high uncertainty. Low conﬁdence (or high uncertainty) reﬂects the incorrect spatial prior inferred from an incomplete training set. Indeed, in the training set, there is no example of MS lesions appearing in the corpus callosum. On the contrary, the random forest is able to detect suspicious regions with high certainty. Suspicious regions are visually very similar to MS lesions and widely represented in the training data, but they are not delineated by the expert, e.g. the left frontal lobe lesion again in Fig. 5, slice 38. The appearance model and spatial prior implicitly learned from the training data points out that hyper-intense regions

in the FLAIR MR sequence which lay in the white matter (cf. Analysis of feature relevance section) can be considered as MS lesions with high conﬁdence. Recent histopathological studies have shown that gray matter regions are also heavily affected by the MS disease (Geurts and Barkhof, 2008). In our case, the public dataset does not show any MS lesion in the gray matter of the brain. Subsequently, the decision forest learns that MS lesions preferably appear in the white matter. Adding new cases showing gray matter MS lesions in the training set would allow the forest to automatically adapt the segmentation to include this kind of lesions. This observation stresses the necessity of gathering large and heterogeneous datasets for training purposes. When focusing on quantitative measures, we observe that cases UNC01 and UNC06 from the public dataset show surprisingly low scores (cf. Table 2). The labels by the CHB expert for these two cases are abnormal: the ground truth is mirrored with respect to the anatomical images. This may be considered as a label error and explains the low scores for these two speciﬁc cases. The MSGC website conﬁrmed this observation and subsequently corrected the online database. We also observe that learning on the whole public dataset and testing on the private dataset (cf. Table 3) produces better average results than the three-fold cross-validation carried out on the public dataset (cf. Table 2). Again this illustrates the beneﬁt of learning the classiﬁer on large enough datasets capturing better the variability of the data.

Table 3 Average results computed by the MSGC on the private dataset and compared to the method presented in Souplet et al.(2008). The relative mean improvement on the algorithm presented in Souplet et al.(2008) on the private dataset is deﬁned as follows RI = (scoreRF − scoreSouplet)/scoreSouplet. The RI and the p-values are reported on top for each metric of the MSGC, associated p-values are reported below. Independent quantitative evaluation conﬁrms improvement on the algorithm presented in Souplet et al.(2008). Boldface highlights signiﬁcant improvements. The spatial random forest achieves, on average, slightly larger true positive (TPR), which is beneﬁcial and comparable false positive (FPR) rates but lower volume difference (VD) and surface distance (SD) values.

Table 5 Average results computed by the MSGC on the private dataset and compared to the method presented in Bricq et al.(2008a). The relative mean improvement on the algorithm presented (Bricq et al., 2008a) on the private dataset is deﬁned as follows RI = (scoreRF − scoreBricq)/scoreBricq. The RI and the p-values are reported on top for each metric of the MSGC, associated p-values are reported below. Independent quantitative evaluation conﬁrms improvement on the algorithm presented in Bricq et al.(2008a). Boldface highlights signiﬁcant improvements. The spatial random forest achieves, on average, slightly larger true positive (TPR), which is beneﬁcial, but also slightly larger false positive (FPR) rates, and lower volume difference (VD) and surface distance (SD) values.

4*UNC

Rater

Metric [%]

Souplet et al. (2008)

Context-rich RF

RI [%]

p-value

Rater

Metric [%]

Bricq et al. (2008a)

Context-rich RF

RI [%]

p-value

4*CHB

VD SD TPR FPR VD SD TPR FPR

86.48 ± 104.9 8. 20 ± 10. 89 57.45 ± 23.22 68.97 ± 19.38 55.76 ± 31.81 7. 4 ± 8. 28 49.34 ± 15.77 76.18 ± 17.07

52.94 ± 28.63 5. 27 ± 9. 54 58.08 ± 20.03 70.01 ± 16.32 50.56 ± 41.41 5. 6 ± 6. 67 51.35 ± 19.98 76.81 ± 11.70

− 38.7 − 35. 7 + 1.0 + 1.5 − 9.4 − 24. 3 + 3.9 + 0.1

0.094 4. 2 ⋅ 10− 6 0.90 0.70 0.66 6. 1 ⋅ 10− 3 0.54 0.83

4*CHB

VD SD TPR FPR VD SD TPR FPR

73.03 ± 78.80 6. 65 ± 6. 55 46.70 ± 19.94 51.06 ± 25.23 51.33 ± 27.00 6. 61 ± 5. 23 39.50 ± 16.06 60.80 ± 22.75

52.94 ± 28.63 5. 27 ± 9. 54 58.08 ± 20.03 70.01 ± 16.32 50.56 ± 41.41 5. 6 ± 6. 67 51.35 ± 19.98 76.81 ± 11.70

− 27.5 − 20. 8 + 24.4 + 37.1 − 1.50 − 15. 2 + 30.0 + 26.3

0.19 0. 20 1.0 ⋅ 10− 3 8.3 ⋅ 10− 5 0.92 0. 17 1.2 ⋅ 10− 3 8.6 ⋅ 10− 4

4*UNC

4*UNC

E. Geremia et al. / NeuroImage 57 (2011) 378–390 Table 6 Average results computed by the MSGC on the private dataset and compared to the method presented in Shiee et al.(2008). The relative mean improvement on the algorithm presented (Shiee et al., 2008) on the private dataset is deﬁned as follows RI = (scoreRF − scoreShiee)/scoreShiee. The RI and the p-values are reported on top for each metric of the MSGC, associated p-values are reported below. Independent quantitative evaluation conﬁrms improvement on the algorithm presented in Shiee et al.(2008). Boldface highlights signiﬁcant improvements. The spatial random forest achieves, on average, slightly larger true positive (TPR), which is beneﬁcial but slightly larger false positive (FPR) rates, and lower volume difference (VD) and surface distance (SD) values. Rater

Metric [%]

Shiee et al. (2008)

Context-rich RF

RI [%]

p-value

4*CHB

VD SD TPR FPR VD SD TPR FPR

84.17 ± 120.8 7. 95 ± 16. 65 55.40 ± 23.60 68.85 ± 23.75 69.63 ± 115.2 7. 104 ± 8. 93 49.79 ± 24.54 74.28 ± 20.08

52.94 ± 28.63 5. 27 ± 9. 54 58.08 ± 20.03 70.01 ± 16.32 50.56 ± 41.41 5. 6 ± 6. 67 51.35 ± 19.98 76.81 ± 11.70

− 37.1 − 33. 5 + 4.83 + 1.69 − 27.4 − 21. 1 + 3.14 + 3.41

0.22 9. 0 ⋅ 10− 3 0.64 0.76 0.42 0. 15 0.76 0.50

4*UNC

Inﬂuence of preprocessing Data normalization is critical in order to ensure that the feature is evaluated in a coherent way in all the images presented to the forest.

385

The evaluation of context-rich features, θcont, is sensitive to rotation: spatial normalization is performed using rigid registration (Prima et al., 2001). In the same way, the evaluation of intensity-based features requires inter-case intensity calibration. Classiﬁcation results for cases from the CHB (cf. Figs. 5 and 6) and the UNC (cf. Figs. 7 and 8) centers are obtained with the same forest. It is mandatory to use the same preprocessing as during training (cf. Data preprocessing section). By doing so, the cases from different datasets, e.g. T1-weighted and FLAIR images in Figs. 5 and 6, show very similar intensity values for a speciﬁc brain tissue and a given MR sequence. However, we observe that the contrast in the T1-weighted and FLAIR images is more marked in case CHB05 (cf. Fig. 5) than in case UNC02 (cf. Fig. 6). Despite contrast changes, classiﬁcation results are coherent. This illustrates the stability of our method, the random forest framework together with its preprocessing step to slight inter-image contrast variations. The trees are generated in parallel on 30 nodes and gathered to form the forest. Cropping and sub-sampling the training images aims at reducing, by a factor larger than 10, the execution time spent to learn a single tree. On IBM e325 dual-Opterons246 at a maximum frequency of 2 GHz, learning a tree on 20 sub-sampled images and with parameters ﬁxed in Results on the private MSGC dataset section on a single CPU takes, on average, 8 h.

Fig. 5. Segmenting case CHB05 from the public MSGC dataset. From left to right: preprocessed T1-weighted (IT1), T2-weighted (IT2) and FLAIR MR images (IFLAIR) overlayed with the associated ground truth GT, the posterior map Posterior = (Plesion(vk))k displayed using an inverted gray scale and the FLAIR sequence overlayed with the segmentation (Seg = (Posterior ≥ τposterior) with τposterior = 0.5). Segmentation results show that most of lesions are detected. Although some lesions are not detected, e.g. peri-ventricular lesion in slice 38, they appear enhanced in the posterior map. Moreover the segmentations of slices 38 and 42 show peri-ventricular regions, visually very similar to MS lesions, but not delineated in the ground truth.

386

E. Geremia et al. / NeuroImage 57 (2011) 378–390

Fig. 6. Segmenting case UNC02 from the public MSGC dataset. From left to right: preprocessed T1-weighted (IT1), T2-weighted (IT2) and FLAIR MR images (IFLAIR) overlayed with the associated ground truth GT, the posterior map Posterior = (Plesion(vk))k displayed using an inverted gray scale and the FLAIR sequence overlayed with the segmentation (Seg = (Posterior ≥ τposterior) with τposterior = 0.5).

Fig. 7. Inﬂuence of forest parameters on segmentation results. Both curves were plotted using mean results from a 3-fold cross validation on the public dataset. Left: the ﬁgure shows the inﬂuence of forest parameters on the area under the precision–recall curve. Right: the ﬁgure shows the inﬂuence of forest parameters on the area under the ROC curve. The ideal classiﬁer would ensure area under the curve to be equal to 1 for both curves. We observe that 1) for a ﬁxed depth, increasing the number of trees leads to better generalization; 2) for a ﬁxed number of trees, low depth values lead to underﬁtting while high values lead to overﬁtting; and 3) overﬁtting vanishes by increasing the number of trees.

E. Geremia et al. / NeuroImage 57 (2011) 378–390

387

Forest parameters were indeed selected in a safety-area with respect to under- and overﬁtting. The safety-area corresponds to a sufﬁciently ﬂat region in the evolution of the areas under the ROC and the precision–recall curve. As shown in Fig. 8, increasing the number of trees tends to beneﬁt the generalization power of the classiﬁer. We also observe that the performance of the classiﬁer stabilizes for large enough forests. Analysis of feature relevance During training, features considered for node optimization form a large and heterogeneous set (cf. Visual features section). Unlike other classiﬁers, random forests provide an elegant way of ranking these features according to their discriminative power. In this section, we aim at better understanding which are the most discriminative channels and visual cues (local, context-rich or symmetric) used in the classiﬁcation process.

Fig. 8. Inﬂuence of the number of trees on segmentation results. Both curves were plotted using mean results from a 3-fold cross validation on the public dataset. Top: the ﬁgure shows the inﬂuence of the number of trees on the area under the precision–recall curve. Bottom: the ﬁgure shows the inﬂuence of the number of trees on the area under the ROC curve. We observe that, for a ﬁxed depth D = 14, increasing the number of trees improves generalization as stated in Breiman(2001). The increase in performance stabilizes around the value T = 30.

Inﬂuence of forest parameters The number of the trees and their depth, respectively denoted by T and D, characterize the generalization power and the complexity of the non-parametric model learned by the forest. This section aims at understanding the contribution of each of these meta-parameters. A 3-fold cross-validation on the public dataset is carried out for each parameter combination. Segmentation results are evaluated for each combination using two different metrics: the area under the receiver operating characteristic (ROC) curve and the area under the precision–recall curve. The ROC curve plots TPR vs. FPR scores computed on the test data for every value of τposterior ∈ [0,1]. The precision–recall curve plots PPV vs. TPR scores computed on the test data for every value of τposterior ∈ [0,1]. Results are reported in Fig. 7. We observe that 1) for a ﬁxed depth, increasing the number of trees leads to better generalization; 2) for a ﬁxed number of trees, low depth values lead to underﬁtting while high depth values lead to overﬁtting; and 3) overﬁtting is reduced by increasing the number of trees. This analysis was carried out a posteriori. Tuning the metaparameters of the forest on the training data is not a valid practice. Using out-of-bag samples for forest parametrization is indeed preferable. Due to the fact that little training data is available for the MS lesion class, available labeled data was exclusively used to train the forest. From this perspective, the forest parameters were set to arbitrary but high enough values to avoid under- and overﬁtting: T = 30 and D = 20.

Most discriminative visual features The ﬁrst approach consists in counting the nodes in which a given feature was selected. We observe that local features were selected in 24% of the nodes, context-rich features were selected in 71% of the nodes whereas symmetry features were selected in 5% of the nodes (cf. Fig. 9). In this case, no distinction is made as for the depth at which a given feature was selected. Context-rich features exhibit high variability (900 of them are randomly sampled at every node). This variability combined with their ability to highlight regions which differ from their neighborhood explains why they were chosen. Together with local features, contextrich features learn a multi-channel appearance model conditioned by tissue spatial priors. Symmetry features are under-represented in the forest and thus prove to be the least discriminative ones. This is due to the fact that a large proportion of peri-ventricular MS lesions tend to develop in a symmetric way. Nevertheless, symmetric features appear in top levels of the tree (up to third level) which indicates that they provide an alternative to local and context-rich features when these two fail. A ﬁner estimation of the feature importance consists in weighting the counting process. For a given feature, instead of only counting the nodes in which it appears, we also take into account the proportion of lesion voxels it helps discriminating: “the larger the proportion of lesion voxels it helps to discriminate, the larger the weight of the feature”. This leads us to deﬁne for a ﬁxed depth value d, the importance of a given feature type, denoted by (IFT), as: IFT ðαÞ =

1 1 ∑ j T p j·χα θp α∈floc; cont; symg 1 jT j p

ð11Þ

where α denotes a feature type, p indices the nodes in layer d, T 1 is the training set of lesion voxels which partition T 1p reached node p, and χ is the indicator function such that: 1; χα θp = 0;

if θp is of type α : otherwise

ð12Þ

The feature importance evaluates to 21.1% for local features, 76.6% for context-rich features and 2.3% for symmetry features. Results are comparable to those obtained only by counting the features in the forest, but the real advantage of this measure is to allow us to draw depth-by-depth feature importance analysis in a normalized way. The feature importance as a function of the depth of the tree is reported in Fig. 10. Presented results are averaged values over a forest containing T = 30 trees. Again, we observe that context-rich features are predominantly selected as the most discriminative, which conﬁrms the trend reported in Fig. 9. However, as shown in Fig. 10, the preponderance of context-rich features is not uniform throughout

388

E. Geremia et al. / NeuroImage 57 (2011) 378–390

Fig. 9. Ranking features according to the proportion of nodes in which they appear. Context-rich features are selected in 71% of the nodes, local features are selected in 24% of the nodes whereas symmetry features are selected in 5% of the nodes.

the tree. Indeed, local features are the most discriminative in layers 0 and 2. A careful analysis of selected channels helps understanding why local features are selected in the top layers of the tree (cf. Most discriminative channels section). The selected context-rich features show high variability. More speciﬁcally, the long-range regions are distributed all over the neighborhood. Depth-by-depth analysis does not show any speciﬁc pattern in the position of the regions with respect to the origin voxel. In addition, the volume of the regions also show high variability. The observed heterogeneity of selected context-rich features aims at coping with the variability of MS lesions (shape, location and size). The symmetry feature is under-represented in the forest. Its discriminative power is thus very low compared to local and contextrich features. This observation induces two complementary interpretations to explain why symmetry features are the least signiﬁcant: 1) most of MS lesions appear in peri-ventricular regions and in a symmetric way, and 2) most of MS lesions can be clearly identiﬁed by their signature across MR sequences and their relative position in the white matter of the brain. However, in deeper layers of the trees, the symmetry feature is more signiﬁcant and tends to classify ambiguous

Fig. 10. Type of feature selected by layer of the tree. For a ﬁxed depth, the red circle stands for the importance of the context-rich feature (θcont), while the green circle stands for the importance of the local feature (θloc). For clarity, symmetry features (θsym) are omitted as they are under-represented in the forest. The blue line monitors the proportion of training samples of the lesion class which do not reside in leaf nodes, for each layer of the tree. We observe that context-rich features are predominantly selected as the most discriminative ones except in layers 0 and 2.

asymmetrical regions. When looking into the selected features, we also notice that the hard symmetric constraint is preferred over the loose symmetric constrain (cf. Visual features section). Indeed, the feature importance evaluates to 1.6% for the hard symmetric feature, and to 0.7% for the loose symmetric feature. Moreover, in the rare cases where the loose constrain is selected, the 6-neighbors version predominates (cf. Visual features section). This observation supports the idea that considering brain hemispheres as symmetric is an accurate approximation in our speciﬁc setting (cf. Materials and Methods sections). Most discriminative channels The second approach focuses on the depth at which a given feature was selected. For every tree in the forest, the root node always applies a test on the FLAIR sequence (θloc FLAIR). It means that out of all available features, containing local, context-rich and symmetry multi-channel features, θloc FLAIR was found to be the most discriminative. At the second level of the tree, a context-rich feature on spatial priors (θcont WM, GM) appears to be the most discriminative over all trees in the forest. It aims at discarding all voxels which do not belong to the white matter. The optimal decision sequence found while training the contextrich forest can thus be thought as a threshold on the FLAIR MR sequence followed by an intersection with the white matter mask (cf. Fig. 11). Interestingly, this sequence matches the ﬁrst and second steps of the pipeline proposed by the challenge winner method (Souplet et al., 2008). Note that in our case, it is automatically generated during the training process. Deeper layers in the trees, then, reﬁne the segmentation of MS lesions by applying more accurate decisions. The feature importance (cf. Eq. (11)) can be extended in a straightforward way to be parametrized not only by the type of feature (local, context-rich, and symmetric) but also by the channel. When globally looking at the selected channels (cf. Fig. 12), we notice that their importance varies throughout the tree: ﬁrst layers, as mentioned before, favor detection of bright spots in the white matter by successively testing the FLAIR MR sequence, spatial priors on WM and GM tissues and ﬁnally testing on the T2 MR sequence; deeper layers take into account other modalities to adjust the segmentation. Conclusion We demonstrated the power of the RF formulation applied to the difﬁcult task of MS lesion segmentation in multi-channel MR images. We presented three kinds of 3D features based on multi-channel intensity, prior and context-rich information. Those features are part

E. Geremia et al. / NeuroImage 57 (2011) 378–390

389

Fig. 11. Combination of features and channels learned by the forest to discriminate MS lesions. The ﬁrst layer of all trees in the forest performs a threshold on the FLAIR MR sequence. The second one discards all voxels which do not belong to the white matter. The posterior map is obtained by using a forest with trees of depth 2 and thus highlights hyper-intense FLAIR voxels which lie in peri-ventricular regions.

of a context-rich random decision forest classiﬁer which demonstrated improved results on one of the state of the art algorithms on the public MS challenge dataset. In addition, the random decision forest framework provided a means to automatically select the most discriminative features to achieve the best possible segmentation. Future work could include the use of more sophisticated features to reduce even further the preprocessing requirements. The context-rich

random forest framework presented in this article is generic which is an additional strength of the method. It can be applied as is to any other segmentation task, e.g. brain tumors segmentation in multisequence MR images of the brain. Finally, one could investigate an extension of the proposed approach to larger multi-class problems in order to try to simultaneously segment brain tissues (WM, GM, and CSF) along with MS lesions. Appendix A. Supplementary data Supplementary data to this article can be found online at 10.1016/j. neuroimage.2011.03.080. References

Fig. 12. Channel importance as a function of the depth of the tree. Plots draw the channel importance drawn as a function of the depth of the tree for both local (top) and long-range features (bottom). For a ﬁxed depth, only the most discriminative channel is depicted. Note how successive layers of the tree test complementary channels: the ﬁrst layer performs a local test on the FLAIR MR sequence in order to detect bright spots, the second one discards all voxels which do not belong to the white matter by using context-rich information over the WM and GM channels. Note that a large spectrum of available channels is tested throughout the tree.

Admiraal-Behloul, F., van den Heuvel, D., Olofsen, H., van Osch, M., van der Grond, J., van Buchem, M., Reiber, J., 2005. Fully automatic segmentation of white matter hyperintensities in mr images of the elderly. Neuroimage 28 (3), 607–617. Akselrod-Ballin, A., Galun, M., Basri, R., Brandt, A., Gomori, M.J., Filippi, M., Valsasina, P., 2006. An integrated segmentation and classiﬁcation approach applied to multiple sclerosis analysis. CVPR '06: IEEE, pp. 1122–1129. Amit, Y., Geman, D., 1997. Shape quantization and recognition with randomized trees. Neural Comput. 9 (7), 1545–1588. Anbeek, P., Vincken, K., Viergever, M., 2008. Automated MS-lesion segmentation by knearest neighbor classiﬁcation. The MIDAS Journal — MS Lesion Segmentation (MICCAI 2008 Workshop). Anbeek, P., Vincken, K., van Osch, M., Bisschops, R., van der Grond, J., March 2004. Probabilistic segmentation of white matter lesions in MR imaging. Neuroimage 21 (3), 1037–1044. Andres, B., Köthe, U., Helmstaedter, M., Denk, W., Hamprecht, F.A., 2008. Segmentation of SBFSEM volume data of neural tissue by hierarchical classiﬁcation. DAGMSymposium, pp. 142–152. Bernhard Schölkopf, C.J.B., Smola, A.J., 1999. Advances in Kernel Methods: Support Vector Learning. MIT Press, Cambridge, MA. Bishop, C., 2006. Pattern Recognition and Machine Learning. Springer. Breiman, L., 2001. Random forests. Machine Learn. 45 (1), 5–32. Breiman, L., Friedman, J.H., Olshen, R.A., J.S.C., 1984. Classiﬁcation and Regression Trees. Wadsworth Press. Bricq, S., Collet, C., Armspach, J.-P., 2008a. MS lesion segmentation based on hidden Markov chains. september 11th International Conference on Medical Image Computing and Computer Assisted Intervention. Paper Selected for “A Grand Challenge: 3D Segmentation in the Clinic”. MICCAI, New-York. September 6–10. Bricq, S., Collet, C., Armspach, J.-P., 2008b. Unifying framework for multimodal brain MRI segmentation based on hidden Markov chains. Med. Image Anal. 12 (6), 639–652 December. Criminisi, A., Shotton, J., Bucciarelli, S., 2009. Decision forests with long-range spatial context for organ localization in CT volumes. MICCAI workshop on Probabilistic Models for Medical Image Analysis (MICCAI-PMMIA). Criminisi, A., Shotton, J., Robertson, D., Konukoglu, E., 2010. Regression forests for efﬁcient anatomy detection and localization in CT studies. MICCAI Workshop on Medical Computer Vision: Recognition Techniques and Applications in Medical Imaging (MICCAI-MCV). Datta, S., Sajja, B.R., He, R., Wolinsky, J.S., Gupta, R.K., Narayana, P.A., 2006. Segmentation and quantiﬁcation of black holes in multiple sclerosis. Neuroimage 29 (2), 467–474.

390

E. Geremia et al. / NeuroImage 57 (2011) 378–390

Dugas-Phocion, G., Ballester, M.Á.G., Malandain, G., Ayache, N., Lebrun, C., Chanalet, S., Bensa, C., 2004. Hierarchical segmentation of multiple sclerosis lesions in multisequence MRI. ISBI. IEEE, pp. 157–160. Evans, A.C., Collins, D.L., Mills, S.R., Brown, E.D., Kelly, R.L., Peters, T.M., 1993. 3D statistical neuroanatomical models from 305 MRI volumes. IEEE-Nuclear Science Symposium and Medical Imaging Conference, pp. 1813–1817. Freifeld, O., Greenspan, H., Goldberger, J., 2009. Multiple sclerosis lesion detection using constrained GMM and curve evolution. J. Biomed. Imaging 2009, 1–13. Geurts, J.J., Barkhof, F., 2008. Grey matter pathology in multiple sclerosis. Lancet Neurol. 7 (9), 841–851 September. INRIA, 2010. MedINRIA. www-sop.inria.fr/asclepios/software/MedINRIA/index.php2010. Lempitsky, V.S., Verhoek, M., Noble, J.A., Blake, A., 2009. Random forest classiﬁcation for automatic delineation of myocardium in real-time 3D echocardiography. FIMH. LNCS 5528. Springer, pp. 447–456. Menze, B.H., Kelm, B.M., Masuch, R., Himmelreich, U., Petrich, W., Hamprecht, F.A., 2009. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classiﬁcation of spectral data. BMC Bioinformatics 10, 213. Prima, S., Ayache, N., Barrick, T., Roberts, N., 2001. Maximum likelihood estimation of the bias ﬁeld in MR brain images: investigating different modelings of the imaging process. MICCAI. LNCS 2208. Springer, pp. 811–819. Prima, S., Ourselin, S., Ayache, N., 2002. Computation of the mid-sagittal plane in 3D brain images. IEEE Trans. Med. Imaging 21 (2), 122–138. Quinlan, J.R., 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann. Rey, D., October 2002. Détection et quantiﬁcation de processus évolutifs dans des images médicales tridimensionnelles : application à la sclérose en plaques. Thèse de sciences, Université de Nice Sophia-Antipolis, (in French). Shiee, N., Bazin, P., Pham, D., 2008. Multiple sclerosis lesion segmentation using statistical and topological atlases. The MIDAS Journal — MS Lesion Segmentation (MICCAI 2008 Workshop).

Shiee, N., Bazin, P.-L., Ozturk, A., Reich, D.S., Calabresi, P.A., Pham, D.L., 2010. A topologypreserving approach to the segmentation of brain images with multiple sclerosis lesions. Neuroimage 49 (2), 1524–1535. Shotton, J., Winn, J.M., Rother, C., Criminisi, A., 2009. TextonBoost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int. J. Comp. Vis. 81 (1), 2–23. Souplet, J.-C., Lebrun, C., Ayache, N., Malandain, G., 2008. An automatic segmentation of T2-FLAIR multiple sclerosis lesions. The MIDAS Journal — MS Lesion Segmentation (MICCAI 2008 Workshop). Styner, M., Lee, J., Chin, B., Chin, M., Commowick, O., Tran, H., Markovic-Plese, S., Jewells, V., Warﬁeld, S., 2008a. 3D segmentation in the clinic: a grand challenge II: MS lesion segmentation. MIDAS Journal, pp. 1–5. Sep. Styner, M., Warﬁeld, S., Niessen, W., van Walsum, T., Metz, C., Schaap, M., Deng, X., Heimann, T., van Ginneken, B., 2008b. MS Lesion Segmentation Challenge 2008. http://www.ia.unc.edu/MSseg/index.php2008. Van Leemput, K., Maes, F., Vandermeulen, D., Colchester, A.C.F., Suetens, P., 2001. Automated segmentation of multiple sclerosis lesions by model outlier detection. IEEE Trans. Med. Imaging 20 (8), 677–688. Wu, Y., Warﬁeld, S., Tan, I., Wells III, W.M., Meier, D., van Schijndel, R., Barkhof, F., Guttmann, C., 2006. Automated segmentation of multiple sclerosis lesion subtypes with multichannel MRI. Neuroimage 32 (3), 1205–1215 09. Yamamoto, D., Arimura, H., Kakeda, S., Magome, T., Yamashita, Y., Toyofuku, F., Ohki, M., Higashida, Y., Korogi, Y., 2010. Computer-aided detection of multiple sclerosis lesions in brain magnetic resonance images: false positive reduction scheme consisted of rule-based, level set method, and support vector machine. Comput. Med. Imaging Graph. 34 (5), 404–413. Yi, Z., Criminisi, A., Shotton, J., Blake, A., 2009. Discriminative, semantic segmentation of brain tissue in MR images. LNCS 5762. Springer, pp. 558–565. Yin, P., Criminisi, A., Winn, J., Essa, I., 2010. Bilayer segmentation of webcam videos using tree-based classiﬁers. Trans. Pattern Analysis and Machine Intelligence (PAMI), p. 33.

Spatial decision forests for MS lesion segmentation in multi-channel magnetic resonance images

Spatial decision forests for MS lesion segmentation in multi-channel magnetic resonance images

Recommend Documents