Rapid fully automatic segmentation of subcortical brain structures by shape-constrained surface adaptation

Rapid fully automatic segmentation of subcortical brain structures by shape-constrained surface adaptation

Accepted Manuscript Rapid fully automatic segmentation of subcortical brain structures by shape-constrained surface adaptation Fabian Wenzel, Carsten...

2MB Sizes 0 Downloads 88 Views

Accepted Manuscript

Rapid fully automatic segmentation of subcortical brain structures by shape-constrained surface adaptation Fabian Wenzel, Carsten Meyer, Thomas Stehle, Jochen Peters, Susanne Siemonsen, Christian Thaler, Lyubomir Zagorchev, for the Alzheimer’s Disease Neuroimaging Initiative PII: DOI: Reference:

S1361-8415(18)30066-5 10.1016/j.media.2018.03.001 MEDIMA 1346

To appear in:

Medical Image Analysis

Received date: Revised date: Accepted date:

30 May 2017 23 February 2018 8 March 2018

Please cite this article as: Fabian Wenzel, Carsten Meyer, Thomas Stehle, Jochen Peters, Susanne Siemonsen, Christian Thaler, Lyubomir Zagorchev, for the Alzheimer’s Disease Neuroimaging Initiative, Rapid fully automatic segmentation of subcortical brain structures by shape-constrained surface adaptation, Medical Image Analysis (2018), doi: 10.1016/j.media.2018.03.001

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights • Rapid segmentation of clinically relevant sub-cortical brain structures is

CR IP T

proposed • Proposed approach does not require pre-processing, such as bias-field correction, and requires about 30 seconds per scan

• Extensive evaluation demonstrates superior performance (accuracy, reproducibility) of the proposed approach over FSL FIRST or FreeSurfer

AN US

• Segmentation results of hippocampus highly coincide with ADNI-EANM

AC

CE

PT

ED

M

ground truth

1

AC

CE

PT

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

2

ACCEPTED MANUSCRIPT

Rapid fully automatic segmentation of subcortical brain structures by shape-constrained surface adaptation

CR IP T

Fabian Wenzela,∗, Carsten Meyera , Thomas Stehlea , Jochen Petersa , Susanne Siemonsenc , Christian Thalerc , Lyubomir Zagorchevb , for the Alzheimer’s Disease Neuroimaging Initiative1 a Philips

Research Hamburg, Röntgenstraße 24-26, 22305 Hamburg, Germany Research North America, 2 Canal Park, Cambridge, MA 02141, USA c Department of Diagnostic and Interventional Neuroradiology, University Medical Center Hamburg-Eppendorf (UKE), Martinistraße 12, 20251 Hamburg, Germany

AN US

b Philips

Abstract

This work presents a novel approach for the rapid segmentation of clinically relevant subcortical brain structures in T1-weighted MRI by utilizing a shapeconstrained deformable surface model. In contrast to other approaches for segmenting brain structures, its design allows for parallel segmentation of individual

M

brain structures within a flexible and robust hierarchical framework such that accurate adaptation and volume computation can be achieved within a minute

ED

of processing time. Furthermore, adaptation is driven by local and not global contrast, potentially relaxing requirements with respect to preprocessing steps such as bias-field correction. Detailed evaluation experiments on more than

PT

1000 subjects, including comparisons to FSL FIRST and FreeSurfer as well as a clinical assessment, demonstrate high accuracy and test-retest consistency of

CE

the presented segmentation approach, leading, for example, to an average segmentation error of less than 0.5 mm. The presented approach might be useful

AC

in both, research as well as clinical routine, for automated segmentation and ∗ Corresponding

author used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_ to_apply/ADNI_Acknowledgement_List.pdf. 1 Data

Preprint submitted to Medical Image Analysis

March 8, 2018

ACCEPTED MANUSCRIPT

volume quantification of subcortical brain structures in order to increase confidence in the diagnosis of neuro-degenerative disorders, such as Alzheimer’s disease, Parkinson’s disease, Multiple Sclerosis, or clinical applications for other

CR IP T

neurologic and psychiatric diseases. Keywords: Model-based segmentation, shape-constrained deformable models, subcortical brain segmentation, hippocampus, volume quantification,

AC

CE

PT

ED

M

AN US

T1-weighted MRI, MP-RAGE

4

ACCEPTED MANUSCRIPT

1. Introduction Accurate and robust segmentation of subcortical brain structures from structural Magnetic Resonance Imaging (MRI) has gained attention in recent years,

5

CR IP T

since it supports the diagnosis and follow-up assessment of many neuro-degenerative diseases, such as Alzheimer’s disease (AD), multiple sclerosis (MS) or Parkinson’s Disease (PD), mild traumatic brain injury (mTBI), as well as

other Neurological an Psychological disorders, including depression and epilepsy (González-Villà et al., 2016).

Many techniques have been proposed, based on different methodological

paradigms. They include Bayesian approaches using prior probability maps

AN US

10

(Fischl et al., 2002; Scherrer et al., 2008), statistical shape and appearance models (Babalola et al., 2008; Patenaude et al., 2011), multi-atlas registration and label fusion (Heckemann et al., 2006; Klein et al., 2005; Lötjönen et al., 2011; Murphy et al., 2014; Wang et al., 2014; Wolz et al., 2010; Yan et al., 2013; Wang and Yushkevich, 2013), and other methods (Chupin et al., 2009; Corso

M

15

et al., 2007; Morra et al., 2008; Tu et al., 2008). Surveys on subcortical MR brain segmentation algorithms as well as comparative studies have been pubet al., 2015).

ED

lished as well (Babalola et al., 2008; Dolz et al., 2015; Grimm et al., 2015; Tang Even though all of these algorithms have been proposed for scientific use,

20

PT

and a few of them (including FreeSurfer and FSL FIRST) are used in several clinical studies, their use in clinical routine is still limited, partly due to long

CE

processing time, which might for example be caused by the Bayesian approach as implemented in FreeSurfer (Fischl et al., 2002) or sequential multi-atlas registration.

AC

25

This work introduces a novel approach for the rapid segmentation of clin-

ically relevant subcortical brain structures in T1-weighted MRI by utilizing a shape-constrained deformable surface model. Subcortical regions not only include gray matter structures like the putamen, globus pallidus, thalamus, hip-

30

pocampus, amygdala, or caudate nucleus and accumbens. The presented ap5

ACCEPTED MANUSCRIPT

proach is also able to segment adjacent structures like lateral ventricles, third ventricle, cerebellum, brainstem and pons, fornix and septum pellucidum, where some of these structures have primarily been introduced to increase segmenta-

35

CR IP T

tion robustness and to clearly separate adjacent structures (e.g. fornix, septum pellucidum).

The following paragraphs review related work on subcortical brain segmen-

tation and summarize the contributions of the present work. In contrast to a

related recent publication on young subjects suffering from mild traumatic brain injury (Zagorchev et al., 2015), experiments in the present work includes healthy

elderly subjects as well as patients suffering from Alzheimer’s disease or multiple

AN US

40

sclerosis. This work further illustrates the technology of the segmentation framework in more detail (Section 2), and focuses on validation experiments utilizing six different databases with, overall, 1189 scans from 1072 subjects (Section 3). Results of the proposed algorithm are compared to segmentations obtained with 45

the FreeSurfer (Fischl, 2012) and FSL FIRST (Patenaude et al., 2011) software,

M

obtaining comparable or more accurate results which were obtained in less time. Section 4 discusses results and concludes the present work.

ED

1.1. Subcortical brain segmentation: Related work 1.1.1. Voxel-wise, Bayesian segmentation One of the main categories of techniques for segmenting brain regions uses

PT

50

prior probability maps in a Bayesian approach. The widely used FreeSurfer software, for example, utilizes registered prior probability maps per structure,

CE

and segments structures voxel-wise, by utilizing neuroanatomically plausible Markov-Random-Field regularization (Fischl, 2012; Fischl et al., 2002). More specifically, FreeSurfer is based on the likelihood to observe the image voxel

AC

55

intensities given a hypothesis on the class labels for each voxel, and prior information on the class labels at each image position. This prior information is encoded in an atlas in Talairach space, which is registered to the current image using an affine transformation estimated on a set of sample points dis-

60

tributed in the image. The prior information is modeled as an inhomogeneous, 6

ACCEPTED MANUSCRIPT

anisotropic Markov random field and involves the probability for a class label at each location as well as a spatial regularization term containing class transition

1.1.2. Statistical shape and appearance models

CR IP T

probabilities for neighboring voxels.

Statistical shape and appearance models build a second category of segmen-

65

tation techniques, of which the FSL FIRST software is a widely used example (Patenaude et al., 2011). In most implementations of this approach, structures

are represented by surface meshes. In FSL FIRST, segmentation is performed

using active shape and appearance models in a Bayesian framework. More specifically, the segmentation of a structure in a new image is parameterized in

AN US

70

terms of the eigenvalues and eigenvectors of the vertex coordinates of meshes obtained from manually labeled training data after registration to MNI152 space. Image intensities are sampled along intensity profiles at each vertex. Then, the parameterized shape probability given the image intensity values can be expressed in terms of the mean and covariance of the conditional values observed

M

75

in a training set. Hence, a new image is linearly registered to MNI152 space in a first adaptation step, and the inverse transform is applied to the model to

ED

perform segmentation in native space. The parameters of the shape model given the intensity values are determined by maximizing the posterior shape probability distribution using a conjugate gradient descent search scheme, using some

PT

80

computational simplification. The work suggested in this paper, called model-based brain segmentation

CE

(MBS), likewise builds on statistical shape and appearance models. However, it differs from existing approaches in several aspects: (i) Each element of a surface

85

mesh is equipped with an individual boundary detection function that has been

AC

identified during a training phase (Peters et al., 2010). (ii) Model-based brain segmentation — which is based on the shape-constrained deformable model framework (Weese et al., 2001) — is deformable and not constrained to the shape space of eigenmodes. (iii) Moreover, the adaptation process can be performed

90

in a hierarchical configuration that allows to reduce processing time on recent 7

ACCEPTED MANUSCRIPT

computer hardware to about 30 seconds without post-processing steps for the hippocampus and amygdala, see section 2.6. 1.1.3. Multi-atlas registration and label fusion

95

CR IP T

Another class of approaches for subcortical brain segmentation follows multiatlas segmentation, as reviewed e.g. in (Iglesias and Sabuncu, 2015). Here, a

novel test image is compared to a set of manually labeled "atlas" images, usually via registration, and labels of the atlases are propagated to the test image. Algorithms differ mainly

AN US

1. in the selection of atlases to be used to segment the novel image: Us-

ing more atlases may improve segmentation accuracy by capturing more

100

structure-specific variation between patients (Doan et al., 2010), but may lead to higher computational cost. As an extension, additional test images might be added to the set of atlases in a semi-supervised learning approach (Wolz et al., 2010).

M

2. in the registration or alignment of atlases to the novel image, which, gen-

105

erally, need to balance computation time against accuracy (Iglesias and

ED

Sabuncu, 2015; Lötjönen et al., 2010). 3. in the propagation of labels from the atlases to the novel image, where nearest neighbor interpolation (Heckemann et al., 2006) is often applied, potentially enriched by methods to ensure tissue consistency.

PT

110

4. in approaches for label fusion, where the predictions of the selected atlases

CE

are combined for predicting a voxel’s anatomical label. Here, majority voting with global (Heckemann et al., 2006) or local weights (Artaechevarria et al., 2009), probabilistic fusion methods (Warfield et al., 2004) and meth-

AC

115

ods considering correlations among atlases (Wang and Yushkevich, 2013) are often applied.

Additionally, post-processing may be used in multi-atlas segmentation tech-

niques to refine results. A major drawback of multi-atlas segmentation is the computational cost of a new segmentation. Since the registration step domi8

ACCEPTED MANUSCRIPT

120

nates overall processing time, several strategies have been suggested to reduce the computational burden (Iglesias and Sabuncu, 2015). 1.1.4. Convolutional neural networks

CR IP T

Recent approaches for medical image processing include deep learning and

convolutional neural networks. Here, for each structure, voxel-wise segmenta125

tion is done by a configurable number of convolutional steps. In each step, individual network weights and biases are learned in a training phase, solving

a large-scale optimization problem generally based on a large set of annotated ground truth images. Deep learning approaches have been successfully applied

130

AN US

to structure segmentation problems, including subcortical brain segmentation (Dolz et al., 2017). Their current downsides from an academic view is the fact that convolutional neural networks infer from examples only, so that the integration of explicit domain knowledge usually has to be performed in additional processing steps. From a practical perspective, while being considerably faster

135

M

than atlas-based approaches, deep learning usually requires dedicated hardware like GPUs and slows down on standard CPU implementations. (Dolz et al., 2017) report a processing time of 2 - 3 minutes per subject for their sub-cortical

AC

CE

PT

ED

segmentation framework on a GPU.

9

ACCEPTED MANUSCRIPT

2. Material and Methods 2.1. Material This work uses material from a number of publicly accessible neuroimag-

140

CR IP T

ing studies. Inclusion criteria were (i) availability of T1-weighted brain scans of visually acceptable scan quality, (ii) externally available ground truth (either as good quality, independent, manually delineated label images, as clinical diagnosis, or as follow-up data of healthy subjects in which structures are not 145

assumed to change). Exclusion criteria were (i) subjects with brain tumor or (ii) stroke, as visible in the T1-weighted brain scans. Table 1 summarizes datasets

AN US

which have been used for evaluation. As explained in more detail in section 3, scans differed in the scanner’s field strength, acquisition and post-processing ar-

tifacts like blurring or flow effects, and slight variations of acquisition sequence 150

parameters. However, most sequences roughly followed ADNI recommendations for T1-weighted image acquisition, which today can be considered as a

M

de-facto standard for 3D T1-weighted brain imaging (Ellingson et al., 2015). Data included in a few cohorts for the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partner-

ED

155

ship, led by Principal Investigator Michael W. Weiner, MD. The primary goal

PT

of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD). For up-

CE

160

to-date information, see www.adni-info.org. All studies are compliant with

AC

ethical policies of corresponding relevant bodies. 2.2. Shape-constrained deformable models 2.2.1. General framework

165

Shape-constrained deformable models (Weese et al., 2001, 2014) combine

properties of deformable models or active contour models (Kass et al., 1988) 10

ACCEPTED MANUSCRIPT

and active shape models (ASM) (Cootes et al., 1995). Both methods deform a contour (in 2D) or surface (in 3D) to image features in order to delineate an anatomical structure. Active contour models are flexible to describe any shape and balance an intensity-driven external energy term against a deformation-

CR IP T

170

penalty-driven internal energy term. Active shape models describe a manifold of shapes in a point distribution model (PDM), encoding a mean shape as well as

shape modes covering typical shape variations. For ASMs, the external energy

penalizes deviations of surface elements from detected boundaries, and the shape 175

mode weights are adjusted to minimize external energy. Shape-constrained de-

AN US

formable models (SCDMs) combine the flexibility of active contour models with

the prior shape knowledge embedded in ASMs. The following paragraphs reintroduce details of the utilized framework for SCDMs, as previously applied for the segmentation of cardiac structures in CT and MRI images (Ecabert et al., 180

2011, 2008, 2006; Peters et al., 2010, 2007). Subsequently, specific extensions for the segmentation of subcortical brain structures in MRI images are laid out.

M

In shape-constrained deformable models, organ surfaces are represented by a triangulated mesh of fixed topology with V vertices v k and T triangles with

185

ED

center coordinates ci . As for ASMs (Cootes et al., 1995), a shape model is created via a principal component analysis (PCA) from a set of training shapes.

PT

The resulting PDM has a mean mesh and P shape modes: ¯k+ vk = m

P X j=1

pj · mjk

(1)

CE

¯ k and mjk are the coordinates of vertex k in the mean mesh and of Here, m

the shape mode j that is weighted with pj . Additional shape variability observed

AC

in a new brain image like rotation, scaling or shearing (which is not contained

190

in the eigenmodes of the model) can be described by introducing a global linear transformation T (q), with q denoting the transformation parameters. This

transformation is applied to the mesh model and yields the current coordinates

11

ACCEPTED MANUSCRIPT

{v 1 , . . . , v V } of the segmented mesh in a new brain image: ¯k+ v k = T (q) m

P X j=1



pj · mjk 

(2)

.

CR IP T



For complex anatomies with substructures {s} that have partly independent

195

scales and orientations – like the aging or diseased brain with different degree of atrophy in different brain structures, such as the hippocampus – a piecewise linear transformation model was found to better capture typical shape variations than PDM modes (Ecabert et al., 2007). Hence, partitioning the brain into S

200

AN US

substructures and modeling the variability of part s by a linear transformation

Ts (qs ), where qs denotes the corresponding transformation parameters, leads

to a transformed model

vk =

S X s=1



¯k+ wk,s · Ts (qs ) m

P X j=1

pj ·



mjk 

.

(3)

M

If all parts would be independent, the subdivision could be described by weights wk,s = 1 if vertex k belongs to part s and wk,s = 0 otherwise. To avoid discontinuities between the individual parts, the transformations are interpolated in transition regions that depend on the border between the parts. P Practically, this is done by assigning weights 0 < wk,s < 1 with s wk,s = 1

ED

205

PT

in transition regions. It should be noted that this approach might result in

overlapping adjacent structures. This effect can be minimized if the training meshes are carefully created without these overlaps. In addition, target point candidates located inside other mesh regions are penalized, reducing the chance

CE

210

of overlaps during adaptation. In the present work, only minimal overlaps between adjacent structures — if at all — have been observed, and experimental

AC

results in section 3 did not contain additional correction steps.

215

To increase robustness of the mesh adaptation, the adaptation of a mean

anatomical mesh model to a new image is done in several steps, embedded in a hierarchical workflow, where the degrees of freedom of the allowed mesh deformation is successively increased in each step. These steps are outlined in 12

ACCEPTED MANUSCRIPT

the following sections. 2.2.2. Initialization First, the mean mesh is automatically positioned in the image according

220

CR IP T

to a localization technique known as the Generalized Hough Transform (GHT) (Ballard, 1981), considering the orientation of the scan in terms of DICOM

patient coordinates. In this step, the 3D image volume is first downsampled to 2mm voxel dimension and then transformed to an edge image (Canny, 1986). 225

The GHT algorithm operates by shifting a template of model edges across the

current edge image. Coinciding edge voxels between the current image and

AN US

the shifted template with similar edge orientation are counted and the shift parameters with maximum count are taken as solution (Ecabert et al., 2008). The edge template is composed from edge images corresponding to a variety of 230

individual organ shapes to account for (limited) shape variation in the matching process. Due to computational constraints, the GHT only estimates translation

M

¯ k }. Hence, its center {m ¯ k } is placed at the parameters for the mean mesh {m

optimal 3D location found by the GHT. Further operations like rotation, scaling or shearing are accomplished in the parametric adaptation step by applying a suitable linear transformation T (q), as outlined in the following section.

ED

235

2.2.3. Parametric adaptation

PT

Starting from this mesh initalization, boundaries are detected in the neighborhood of the mesh surface by evaluating an individual boundary detection

CE

function for each mesh triangle on a search profile of regularly spaced points

240

along the triangle normal. Points with maximum detector response are defined as target points xtarget (see Section 2.3). The corresponding optimization probi

AC

lem for the external energy penalizes squared distances of triangle centers from planes which are orthogonal to the local image gradient passing through the target points: Eext =

T X i=1

2

∇I(xtarget )



target i wi · · ci − x i

target

k∇I(xi

)k 13

.

(4)

ACCEPTED MANUSCRIPT

The weights wi reflect the reliability of a detected boundary and are con-

245

sidered as constants during energy optimization. Note that Eext is a function of the triangle centers {ci } and thus of the vertex coordinates {v k }. To in-

CR IP T

crease the robustness of the mesh adaptation, a global rigid transformation T (q) = Trigid (q) as given in eq. (2) is optimized to compensate for small rota-

250

tions first. The external energy Eext , Equation (4), now becomes a function of the transformation parameters q and the eigenmode weights {pj }, which are op-

timized to minimize the external energy, given the detected target points, using Gauss-Newton optimization. The mesh is then deformed by alternating target

255

AN US

point detection and energy minimization until convergence. Subsequently, a global affine transformation T (q) = Taf f ine (q) accounts for potential scaling or

shearing of brain structures. After that, piecewise linear transformations Ts (qs )

are allowed, equation (3), to account for structure-specific scaling, shearing and

rotation. Here, optimization is done on all transformation parameters qs and the eigenmode weights {pj } to minimize the external energy Eext , as given by

equation (4), based on the detected target points. Target point detection and

M

260

energy minimization are iterated until convergence.

ED

2.2.4. Deformable adaptation

In a final step, since the variability of point distributions in the PDM does not

265

PT

capture all possible shape variations, the shape-constrained deformable model is configured to more flexibly adapt by relaxing the constraint of allowing only linear transformations of the adapted mesh. To regularize the optimization

CE

problem, an internal energy term is introduced to penalize deviations of vertices from the manifold of shapes defined by the PDM and the transformations Ts (qs )

. This energy compares edge vectors of the adapting surface mesh with edges as defined in the shape model:

AC 270

Eint

" #

2 P X

j j ¯k −m ¯`+ = wk,s · pj · (mk − m` )

v k − v ` − Ts (qs ) m

j=1 k=1 `∈N (k) s=1 (5) V S X X X

14

ACCEPTED MANUSCRIPT

Here, N (k) lists the indices of neighbor vertices of k, and (k, `) are vertex pairs connected by mesh edges. Given the formal energy terms (4) and (5), the

E = Eint + α · Eext

.

CR IP T

deformable adaptation now minimizes a weighted combination (6)

with a parameter α balancing the contribution of both energy terms. In this 275

work, an optimal value of α has been re-used from previous segmentation tasks focusing on other anatomies (see also Section 2.5) (Ecabert et al., 2008).

As before, iteration between target point detection and energy minimization

AN US

is done until convergence. In the deformable adaptation step, however, the

minimization of the total energy (6) is performed by iterating two steps: (i) The 280

internal energy (5) is minimized by optimizing the parameters qs of the piecewise affine transformations Ts (qs ) for each s, while keeping vertex coordinates {vk } and eigenmode weights {pj } fixed. This optimization is performed as solution of

a registration problem, registering the model in an optimal way to the current

285

M

mesh. (ii) The total energy (6) is minimized by optimizing the eigenmode weights {pj } and the vertex coordinates {vk }, while keeping the transformations

ED

Ts (qs ) fixed. This is a quadratic minimization problem and is solved using the conjugate gradient method. More details can be found in (Ecabert et al., 2008).

PT

2.3. Boundary detectors

Boundary detection is crucial for robust and accurate model adaptation. 290

Section 2.2 explained how boundary detection functions are evaluated on a

CE

search profile per triangle i. This profile is centered at ci , and the profile points

+L {x−L i , . . . , xi } are placed in regular intervals along the triangle normal. The

AC

target point is defined by

295

xtarget = i

arg max {x`i |`=−L,...,+L}



Fi (x`i ) − D · (x`i − ci )2



(7)

Here, Fi is a triangle-specific boundary detection function which has been

identified in a training phase such that it yields high values at correct surface points and low values at other positions. Hence, the boundary detection 15

ACCEPTED MANUSCRIPT

functions enable the exclusion of confusing boundaries. The distance penalty factor D may be tuned such that target points are preferrably found near a corresponding triangle.

CR IP T

Typical boundary detectors are based on an image gradient ∇I(x) (but also

other options are allowed like gray value differences), which is projected onto the

triangle normal ni to suppress non-parallel edges. The result is passed through a sigmoid function limiting the response to some maximum magnitude gmax :   g max · (gmax + k∇I(x)k) Glimit proj (x) = ni · ∇I(x) · 2 gmax + k∇I(x)k2

Since the resulting value Glimit proj (x) does not consistently identify anti-parallel

AN US

300

(8)

boundaries, a factor si = ±1 ensures the proper sign. Therefore, a boundary detection function Fi can be given as

  s · Glimit (x) if Q (x) ∈ [Min , Max ] ∀ j ∈ S , i j i,j i,j i proj Fi (x) =  0 otherwise.

(9)

M

Here, each Qj denotes a local image feature involving gray value characteristics near or across boundaries, which must fall inside a corresponding acceptance interval [Mini,j , Maxi,j ]. The selection of a triangle-specific optimal set

ED

305

Si of such image features from a set of suitable candidates — e.g. the (mean) gray value intensity on one or both sides of the boundary, their difference, their

PT

slope etc. — is explained in the next section. 2.4. Boundary detection training A “Simulated Search” technique identifies suitable combinations of local im-

CE

310

age feature functions for optimal boundary detection per triangle as well as

AC

adequate acceptance intervals in a training phase (Peters et al., 2010). First, a set S = {Qj } of suitable image feature candidates is defined. The values qj of all

features Qj are recorded for all triangle centers of reference segmentations in a

315

set of training images. The responses qj are clustered, and acceptence intervals

are built for each cluster by choosing the 5% and 95% percentile of its range as limits. Then, a pool of boundary detection functions F is defined via the 16

ACCEPTED MANUSCRIPT

structure of Equation (9), where we allow to use up to three features Qj in the triangle-specific set Si . The optimal boundary detection function Fi per triangle 320

i is found by the “Simulated Search” strategy: (i) Each triangle i under inves-

CR IP T

tigation is displaced independently, (ii) boundary detection is performed for all combinations of up to three image feature functions Qj , and (iii) the boundary detection error is recorded. This is done for numerous displacements of the

ground-truth triangle positions in all training images, and all allowed bound325

ary detection function candidates. The boundary detection function Fi from F

leading to the smallest boundary detection error is assigned to the triangle i.

AN US

The “Simulated Search” is further described in (Peters et al., 2010). 2.5. Model-based brain segmentation (MBS)

This section lays out specific aspects of the model-based segmentation frame330

work described in the previous sections for segmentation of subcortical brain

2.5.1. Surface mesh model

M

structures.

The shape-constrained deformable surface model for subcortical brain seg-

335

ED

mentation represents 27 brain regions by a mesh consisting of about V = 30000 vertices and T = 60000 triangles, as illustrated in Figure 1. When appropriate, region definitions adhere to the guidelines for segmenting subcortical structures

PT

from the Center for Morphometric Analysis (CMA), Massachusetts General Hospital, with the few exceptions mentioned in Table 2. Structure-specific average

CE

lengths of a triangular surface element’s edge (“resolutions”) for subcortical re-

340

gions have been chosen according to observations in variability and complexity of shapes. As an example, the hippocampus is modeled in higher resolution since

AC

its boundary sometimes contains subtle folds at a scale of less than 1.5mm. 2.5.2. Generation of grund truth meshes

345

The generation of 96 ground truth meshes (X96 cohort, see Table 1 and

Section 3.1) has been performed by three authors (FW, CM, TS), each having between 5 and 10 years of experience in the domain. Since delineating brain 17

AN US

CR IP T

ACCEPTED MANUSCRIPT

Figure 1: Surface representation of the segmentation model for subcortical brain structures.

M

structures from scratch is a time-consuming task, a semi-automatic approach has been chosen. Here, an intermediate segmentation model trained on a subset of delineated images was used to generate initial reference segmentations for the remaining scans. In a subsequent step, these initial segmentations have been

ED

350

manually corrected and mutually reviewed by interactive manipulation of the

PT

surface mesh using in-house software tools. Triangle-specific boundary detector functions as described in Section 2.3 as well as the mean mesh model and the eigenmodes have been created in a training phase based on the ground truth meshes.

CE 355

AC

2.5.3. Brain segmentation workflow and parameters Section 2.2 has introduced a hierarchical adaptation scheme as a feature of

the model-based segmentation framework. The framework also allows boundary detection functions as well as other adaptation parameters to be modified in

360

different levels of adaptation. In model-based brain segmentation, a first adaptation phase is done on a lower-resolution representation of structures, containing 18

ACCEPTED MANUSCRIPT

about 5000 vertices and 10000 triangles only, aiming at multi-affine alignment, and using boundary detection functions that have been trained on 20 mm intensity profiles. After minimizing the external energy of the lower-resolution representation, new high-resolution vertices are added via interpolation, and

CR IP T

365

a deformable adaptation step is performed according to Section 2.2.4. Here, the boundary detection functions for each triangle have been trained on 4 mm intensity profiles, allowing the selection of more specific features in the vicinity of target boundaries. In addition to the projected gradient and projected 370

gray value differences, the “Simulated Search” uses a second order polynomial

AN US

curve fit across the boundary gray value profile for brain structure segmentation (Section 2.4). This boundary detection function has been added to target CSF surrounding the cerebellum and the brainstem. In both mesh resolutions, 5 eigenmodes have been found to sufficiently describe shape variability 1. In 375

addition to the global constant α balancing the external and the internal energy (see Section 2.2.4), structure-specific external energy weight factors αs are

M

used which have been optimized on the training data (0.2 to 6 in low-resolution, 1 to 6 in high-resolution). Piecewise affine transformations have been defined

380

ED

separately for the following structures: Brainstem, cerebellum, lateral brain structures (including thalamus, putamen, globus pallidus and caudate nucleus and accumbens), corpus callosum, and left and right lateral ventricle. In the

PT

low-resolution mesh, the inferior, anterior, and posterior horns of both ventricles as well as the hippocampus with amygdala have been assigned a separate affine transformation to account for structure-specific deformations.

CE

Figure 2 shows three example cases with the results of model-based surface

385

adaptation.

AC

2.6. Post-processing It should be noted that the model-based brain segmentation introduced in

this chapter focuses on structure boundaries and therefore is not able to exclude

390

non-tissue areas inside subcortical structures. Therefore, for the experiments in Section 3, voxels classified as cerebro-spinal fluid (CSF) inside segmented 19

CR IP T

ACCEPTED MANUSCRIPT

Figure 2: Coronal views of an example T1 brain scan with an adapted mesh. (left+middle): ADNI subjects with varying degrees of hippocampal atrophy, (right): MP2RAGE scan

AN US

regions of the cerebellum and hippocampus have been excluded. This post-

processing step is based on a freely available software for fully automatic tissue classification (“niftyseg”, UCL, (Cardoso et al., 2011)), adding about 30 more seconds of processing time on recent standard computer hardware.

AC

CE

PT

ED

M

395

20

21 780

30

120

113

20

30

96

Nscan

PT

6.74)

55 - 90 (74.12,

16.9)

28 - 88 (64.5,

2.65)

26 - 31 (29.0,

60 - 90 (75, 8.1)

3.9)

45 (47%)

Nmale

390 (50 %)

13

2 (66%)

/ /

AD MCI

NL

NL

AD / NL

group

diagnostic

Siemens,

Philips,

Siemens

Siemens

Siemens

Philips, GE,

vendor

390 (50%)

17

1 (33%)

AD / NL

AD

NL, MS,

NL

NL

GE

field

1.5T, 3T

1.5T, 3T

3T

1.5T, 3T

1.5T

1.5T

3T

strength

1 × 1 × 1.15

(UKE)

0.93 × 0.93 × 0.93

1 × 1 × 1.15 (ADNI)

1.05 × 1.05 × 1.2

1 × 1 × 1.15

1×1×1

1×1×1

1 × 1 × 1.15

Res. (mm)

CR IP T

Siemens,

Philips,

GE

Siemens,

Philips,

GE

GE

AN US

56 (50%)

12 (60%)

20 (67%)

51 (53%)

Nf emale

M

57 (50%)

8 (40%)

10 (33%)

ED

19 - 34 (23.4,

20.73)

18 - 80 (34.3,

7.6)

51 - 90 (74.6,

(mean, std.)

min - max

age (years)

Yes

(ADNI)

Partly

Yes

Yes

Yes

Yes

(ADNI)

Partly

able

avail-

Publicly

NL: healthy control, MCI: Mild cognitive impairment, MS: Multiple sclerosis.

/ female images is given as absolute number and percentage, the age as min, max, mean and standard deviation in years. AD: Alzheimer’ disease,

Table 1: Overview of cohorts used in this work. Nsubj. : Number of subjects per data set, Nscan : Number of scans per data set. The number of male

780

ADNL-ADNI

113

HarP

30

20

OASIS-TRT-20

COMPARE

30

OASIS MICCAI

3

96

X96

Nature

Nsubj.

CE

Cohort

AC

ACCEPTED MANUSCRIPT

AC

22 1.5 mm

1.5 mm 1.5 mm 1.5 mm 1.5 mm

(Right / left) lateral Ventricle

Third ventricle

Brainstem

Cerebellum and peduncles

(Right / left) caudate nucleus

M

AN US

According to CMA guidelines, joint structure.

According to CMA guidelines.

According to CMA guidelines. Pons has been delineated as an individual sub-structure.

According to CMA guidelines.

boundary of sub-structures is not well defined anatomically.

evaluations in this work, however, have been done on complete lateral ventricles, since the

CR IP T

modeled with higher resolution (0.75 mm), due to higher degree of variability. Quantitative

Sub-structures for anterior / posterior / inferior horns exist, since posterior horn has been

According to CMA guidelines.

on hippocampal segmentation harmonization (see Section 3.3).

According to consolidated structure definition as provided by the EADC-ADNI initiative

segmentation).

included in CMA protocol, in order to provide better boundary features for automated

Approximately to CMA guidelines (anterio-laterally, thalamus follows contrasted edge not

According to CMA guidelines.

According to CMA guidelines.

of secondary clinical interest. Excluded from quantitative analyses.

Not included in CMA guidelines. Included only to increase adaptation robustness. Only

ED

Therefore, it has been excluded from quantitative analyses.

in order to increase adaptation robustness and, currently, of secondary clinical interest.

WM structure, not included in CMA guidelines. Included in brain segmentation model

adaptation robustness.

titative analyses in this work. The main purpose for inclusion, currently, is increased

boudaries of the corpus callosum are not well defined, it has been excluded from quan-

White matter (WM) region, not included in CMA guidelines. Since the lateral anatomical

Region characteristics and definition

and, for a few structures, the purpose of including them. Here, resolution denotes the average edge length of triangular mesh elements.

Table 2: Brain structures included in model-based subcortical brain segmentation (MBS), including resolution in mm, remarks on region definition

and accumbens

3.0 mm

1.5 mm

(Right / left) thalamus

(Right / left) amygdala

1.5 mm

(Right / left) pallidum

1.0 mm

1.5 mm

(Right / left) putamen

(Right / left) hippocampus

1.5 mm

1.5 mm

PT

CE

1.5 mm

Resolution

Septum pellucidum

Fornix

Corpus callosum

Region

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT

3. Results and Discussion This chapter presents several experiments done to validate the accuracy and reproducability of the model-based brain segmentation. Table 3 describes the

400

CR IP T

evaluation measures used. 3.1. X96 cohort: Cross-validation

The "X96" cohort consists of 96 data sets which have been randomly selected

from the ADNI study (n=87) and an Alzheimer’s disease study at the Lahey Clinic, Burlington, MA (n=9). Selection has been done such that images were

405

AN US

balanced with respect to diagnosis (Alzheimer’s disease, healthy controls) and

vendor (Philips, GE, Siemens). The objective of this experiment is two-fold: (i) to assess vendor-specific and disease-specific differences of segmentation results and (ii) to analyze whether a single multi-purpose model can be used instead of a matching vendor- or disease-specific one. In order to avoid potential bias, subjects included in X96 have been removed from all following ADNI-based experiments presented in this work.

M

410

For this cohort, ground truth segmentations were available as meshes, since

ED

they have been used to train boundary detection functions. Therefore, computation of the surface-based segmentation error mean was possible, such that segmentation results did not have to be post-processed.

PT

In order separate training from test data, experiments have been performed

415

with 16-fold cross-validation, where testing data has been uniformly distributed

CE

across vendors and diagnostic groups. Table 4 indicates that structures are segmented with a mean segmentation

error of 0.6 mm or less and with a Dice similarity coefficient (DSC) ranging between 0.81 for small structures like the amygdala and 0.96 for large structures

AC

420

like the brainstem and the cerebellum. Results in Table 5 indicate that a segmentation model trained on the same

disease class or vendor results in slightly superior accuracy, which is not surprising. However, a training dataset of all available scans, i.e. across vendors

23

ACCEPTED MANUSCRIPT

Symbol

/

Name

Description

Symmetrized

Mutually averaged mean distance in mm between the triangle

mean

centers of an adapted mesh to an anatomically corresponding

mean

patch-to-

surface distance

CR IP T

Acronym

patch of the reference mesh. It should be noted that mean does not depend on the volume of a structure.

DSC

Dice similarity co-

Measure of overlap between computed segmented region S 0

efficient

and reference segmentation S 00 in terms of corresponding volumes V as

2 · V (S 0 ∩ S 00 ) . V (S 0 ) + V (S 00 ) DSC ranges from 0 (no overlap) to 1 (perfect overlap). It

AN US

DSC =

should be noted that the DSC is a sensitive measure for small structures, since, in that case, it increases the level of discretization artifacts caused by finite voxel sizes. AVD

Absolute

volume

difference

Difference (in ml) between a segmented volume V (S 0 ) and a corresponding reference volume V (S 00 ), averaged over all scans.

Intra-class correla-

Quantifies agreement between two series of measurements of

tion coefficient

two fixed “raters/judges” (Shrout and Fleiss, 1979) . In this

M

ICC(3,1)

ED

work, comparison is done between automatically segmented

Cronbach’s Alpha

AC

CE

r

PT

α

CoV

volumes and ground truth. ICC(3,1) ranges from 0 (no agreement) to 1 (perfect agreement) and has been computed with the R statistical computing environment. Reliability measure denoting the expected correlation of two sets that measure the same construct(Tavakol and Dennick, 2011). α ranges from 0 (no internal consistency) to 1 (perfect internal consistency).

Pearson’s Correla-

Linear correlation coefficient between two series of volume

tion Coefficient

measurements V (S 0 ) and V (S 00 ). r ranges from -1 (perfect anti-correlation) over 0 (no correlation) to 1 (perfect correlation) and has been computed with the R statistical computing environment.

Coefficient of vari-

CoV , also known as relative standard deviation, can be com-

ation

puted as σ/µ, for a sequence of measurements.

Table 3: Evaluation measures used in our experiments.

24

ACCEPTED MANUSCRIPT

mean

DSC

AVD (ml)

ICC(3,1)

Left amygdala

0.54

0.814

0.095

0.654

Right amygdala

0.53

0.811

0.120

0.571

Brainstem with pons

0.29

0.965

0.321

0.974

Left caudate nucleus and accumbens

0.35

0.905

Right caudate nucleus and accumbens

0.35

0.904

Cerebellum & peduncles

0.60

0.962

Left globus pallidus

0.50

0.865

Right globus pallidus

0.50

0.864

Left hippocampus

0.46

0.872

Right hippocampus

0.45

0.877

0.44

Right lateral ventricle Left putamen Right putamen Right thalamus Third ventricle

M

Left thalamus

0.826

0.199

0.877

4.649

0.957

0.125

0.647

0.130

0.583

0.183

0.908

0.201

0.899

0.943

0.761

0.997

0.47

0.938

0.754

0.996

0.41

0.906

0.178

0.913

0.41

0.906

0.186

0.907

0.48

0.924

0.248

0.889

0.45

0.929

0.249

0.899

0.37

0.893

0.152

0.956

0.44

0.90

0.54

0.86

ED

Average of all structures Table 4:

0.202

AN US

Left lateral ventricle

CR IP T

Region

X96 data set: Evaluation of MBS in terms of the mean segmentation error mean

in mm, averaged over the mesh triangles of the corresponding anatomical structure, the Dice similarity coefficient (DSC), the absolute volume difference (AVD) in ml, and the intra-class

as well as disease types, results in almost comparable results. Since scans and

CE

425

PT

correlation coefficient (ICC(3,1)); see also Table 3.

ground-truth segmentations of both testing and training data build a disjoint part of the same cohort, results cannot be generalized and need to be related to

AC

the other evaluations performed in this work. 3.2. OASIS cohorts: Comparative evaluation with independent ground truth de-

430

lineations A second experiment has been set up in order to demonstrate agreement

with externally available, independent ground truth segmentations. Two pub25

ACCEPTED MANUSCRIPT

licly available test sets (OASIS MICCAI and OASIS-TRT-20) from the MICCAI 2012 multi-atlas segmentation challenge and www.mindboggle.info, re435

spectively, provided 50 scans, overall, with ground truth segmentations as anno-

CR IP T

tation labels provided by a third-party (Neuromorphometrics, Inc., MA, USA). Two widely used software packages, FreeSurfer (5.3) and FSL FIRST (5.0),

have been chosen for comparison to MBS. A fair comparison seems possible in this context since structure definition of both software packages as well as 440

MBS — except for a slight deviation in case of the thalamus — also comply

with the CMA structure definitions of subcortical areas (http://www.cma.mgh.

AN US

harvard.edu/manuals/segmentation), as did the ground truth segmentations (http://www.neuromorphometrics.org:8080/seg). Even though acquisition parameters for FSL First’s and FreeSurfer’s training data do not seem to be 445

published explicitly, both software packages recommend T1-weighted / MPRAGE-like acquisition protocols for their use(Stevens, 2017; Jenkinson, 2014). Tables 6 and 7 show comparative results with respect to all volumetric evalu-

M

ation measures, including the results of a significance analysis, which could be done for DSC and AVD. Figures 3 and 4 furthermore illustrate DSC and AVD results as box-and-whisker plots. On the same computer hardware, computation

ED

450

of segmentations, on average, needed about 14 hours (FreeSurfer) or 30 minutes (FSL FIRST). However, it should be noted that the FreeSurfer workflow

PT

included computation of cortical labels as well. According to FreeSurfer’s website (https://surfer.nmr.mgh.harvard.edu/fswiki/recon-all), limitation 455

to subcortical regions might have reduced processing time to about 10 hours.

CE

A few remarks might be relevant for interpreting the results, also in relation

AC

to other experiments of this work:

460

• While the model-based brain segmentation approach was trained on multivendor 3T scans, subjects of the OASIS-TRT-20 and OASIS MICCAI were scanned on a single Siemens 1.5T scanner. • Scans from the OASIS-TRT-20 dataset seem to be averaged from a sequence of 4 repeat scans. Therefore, contrast appears slightly blurred, 26

ACCEPTED MANUSCRIPT

with a potential impact on boundary features. Furthermore, flow artifacts can be seen in a few dedicated brain regions, partly being close to 465

the amygdala, the hippocampus, or the pons. These artifacts are generally

CR IP T

not seen in recent brain scans, including the X96 cohort used for training. Figure 5 shows three randomly picked example brain scans, from the

X96, the OASIS MICCAI, and the OASIS-TRT-20 cohort, which reveals differences in scan characteristics. 470

• Subjects in both OASIS cohorts are younger, on average, than the ones in

the X96 cohort, which has been used for training. Since brain structures

AN US

in the X96 cohort are affected by age-related and disease-related atrophy, the shape of subcortical brain regions might systematically differ between the training and the test group, making the adaptation process of the 475

model-based segmentation a more challenging test scenario. • Ground truth labels of the OASIS cohort have been created semi-automatically,

M

slice by slice, in coronal views. Therefore, in a few areas, they are not 3D consistent, which is apparent in the snapshot shown in Figure 6. This

480

ED

aspect might have an impact on all segmentation approaches which are compared.

• An initial run of the FSL FIRST software (by running the run_first_-

PT

all script as recommended in the FSL FIRST user guide) failed in 4 scans of the OASIS-TRT-20 cohort and 8 scans of the OASIS MICCAI

CE

cohort due to misregistration of FSL’s first registration step to MNI152

485

space. Therefore, two dedicated pre-processing steps were applied (brain

AC

extraction, cropping of the scan’s field of view) in order to yield acceptable

490

registration results. These steps are were not necessary for FreeSurfer and MBS. Nevertheless, visual inspection of segmentation results revealed a few residual segmentation errors in two scans of the OASIS-TRT-20 cohort, with an impact on the results of Table 7 and Figure 4.

These aspects indicate that comparing segmentation results to external ground 27

ACCEPTED MANUSCRIPT

truth is non-trivial and details on secondary aspects like quality of scans as well as ground truth delineations, and even cohort characteristics might have an effect on evaluation results. The main purpose in this experiment, however, is a comparative evaluation with respect to other software tools, for which

CR IP T

495

boundary conditions are the same. The results in Tables 6 and 7 as well as

Figures 3 and 4 show consistently superior results for the model-based brain

segmentation approach for almost all brain structures and evaluation results. Furthermore, Figures 3 and 4 tend to reveal less variation and fewer outliers in 500

volumes across subjects in case of MBS. The slight deviation of the thalamus’

AN US

structure definition from CMA guidelines is apparent in MBS results for DSC

and AVD. However, measures like ICC(3,1), α, and r quantifying volumetric agreement/correlation between automatically segmented structures and ground truth are not affected by this aspect.

Finally, it should be mentioned that the OASIS MICCAI dataset and cor-

505

responding ground truth labels served as training and test data in a previous

M

segmentation challenge, in which 25 different approaches for multi-atlas-based brain structure segmentation took part (Landman and Warfield, 2012). In this

510

ED

competition, average DSC values for subcortical brain regions ranged from 0.79 to 0.84. However, a one-to-one comparison to the results obtained by the modelbased segmentation introduced here is difficult, due to the fact that data in the

PT

segmentation challenge was split into a training set (n=15) and a test set (n=20), which allowed dedicated training with respect to both, individual characteristics of ground truth annotations done by a single rater as well as consistent scan contrasts, similar to the X96 cross-validation experiment of section 3.1 which

CE 515

resulted in DSC values of about 0.9. Furthermore, cortical regions in the chal-

AC

lenge included 36 bi-lateral subcortical structures, only partially overlapping with the structures used in this work (see Table 2). Finally, individual results of the DSC for individual structures are not consistently reported.

28

ACCEPTED MANUSCRIPT

520

3.3. HarP cohort: Evaluation on consolidated, independent ground truth delineations of the hippocampus Section 3.2 describes a few aspects indicating that the comparison to in-

CR IP T

dependent ground truth labels might be biased by potential confounders like differences in region definitions, cohort and scan characteristics, as well as arti525

facts in ground truth.

Therefore, this section describes a second experiment for comparing to external ground truth segmentations which overcomes a few of those limitations, while being limited to the assessment of the hippocampus.

530

AN US

After an EADC-ADNI multi-site effort on consolidating different region

definitions of the hippocampus, a harmonized protocol (HarP) has been proposed.

It focuses on the characteristics of brain scans of elderly subjects,

i.e. on those containing age-related or disease-related hippocampal atrophy (http://www.hippocampal-protocol.net). The published guideline puts emphasis on excluding structures like the choroid plexus in the inferior temporal horn of the lateral ventricles, which might accidentally be included due to sim-

M

535

ilar contrast and the vicinity to the hippocampus’ head (Boccardi et al., 2015).

ED

This structure is usually not visible in brain scans of young subjects and, therefore, cannot be assessed properly when using the OASIS cohorts described in Section 3.2. Additionally, a cohort of 135 test scans with consolidated manual annotations has been made publicly available.

PT

540

From all 135 cases of the HarP cohort, 23 subjects have been excluded,

CE

since they were also contained in the X96 training data set, leaving 113 remaining independent test scans. Figure 7 compares the hippocampal volumes from the manual segmentation and the fully automatically determined volumes for healthy controls (NL), MCI subjects, and AD patients.

AC

545

Table 8 summarizes evaluation measures. Results indicate higher segmentation accuracy for the hippocampus on the

HarP cohort than corresponding results for the OASIS cohort described in Section 3.2. The difference might be explained by two aspects:

29

ACCEPTED MANUSCRIPT

1. Brain scans used in this experiment contain fewer artifacts, and their

550

contrast better matches the X96 cohort used for training, since both data sets for the most part contain scans from the ADNI study.

CR IP T

2. There is a better correspondence of ground truth delineations and struc-

ture definition of the hippocampus between HarP and X96, which is used for training boundary detection functions, than between OASIS and X96.

555

The intra-class correlation coefficient (ICC) outperforms manual inter-rater agreement of trained delineation experts, for which an ICC of 0.89 has been

AN US

published (Frisoni et al., 2015). 3.4. Nature cohort: Comparative evaluation of reliability

Another experiment focuses on the test-retest consistency and is based on

560

a publicly available dataset recommended for this purpose (Maclaren et al., 2014): Three young volunteers have been scanned twice in 20 consecutive sessions, approximately one day apart. The original publication observed signifi-

565

M

cant changes in ventricular volume between visits, assuming that there might be normal variability in certain brain structure volumes. Consistency analysis has

ED

been done with respect to the coefficient of variation (CoV ), which quantifies changes between scans of a single session (intra-session variability, CoVs ) and over time accross sessions (inter-session variability, CoVt ). Table 9 summarizes

570

PT

FreeSurfer results, as reported in the original publication, and extended with corresponding results from the model-based brain segmentation (MBS).

CE

Results demonstrate that brain structure volumes computed by the modelbased segmentation approach are more consistent than volumes computed by

AC

FreeSurfer, both between scans of a single session, and across sessions. 3.5. COMPARE cohort: Blind comparative assessment by Neuroradiologists

575

A blind scoring experiment targets visual assessment of segmentation results

by a Neuroradiologist with three years of experience (CT). For this purpose, the COMPARE cohort has been set up, consisting of n=10 randomly picked healthy controls and n=10 AD subjects from the ADNI database as well as n=10 MS 30

ACCEPTED MANUSCRIPT

subjects from the University Medical Center Hamburg Eppendorf (UKE). ADNI 580

scans did not overlap with the X96 cohort, i.e. they have not been used for MBS training. Segmented labels of 12 subcortical brain structures common to

CR IP T

FSL FIRST (FSL), FreeSurfer (FS), and the model-based brain segmentation (MBS) were overlaid on top of the originating T1-weighted scan in an interactive scoring software application, e.g. allowing opacity of overlaid structures to be 585

changed. Every segmented structure of each of the n=90 scans has been rated separately. The rater was not aware of the fact that segmentations were obtained

by different techniques. Table 10 summarizes the scoring scheme. Figure 8

AN US

illustrates scoring results for each structure and across the three methods and Table 11 reports on significance analysis by means of Student’s t-test.

It can be seen that MBS segmentation results have been significantly higher

590

scored than both, FSL FIRST and FreeSurfer (FS) for almost every individual structure. This observation also applies to the thalamus, for which the MBS region definition slightly differs from the CMA guidelines - as reflected in AVD

595

M

and DSC results of Section 3.2.

3.6. ADNL-ADNI cohort: Volumetric differences between healthy controls and

ED

Alzheimer’s subjects

A final experiment of the model-based brain segmentation investigates a clin-

PT

ical aspect, which is disease-related atrophy of different subcortical regions. For this study, 790 scans from the ADNI cohort have been used, balanced between 600

diagnostic groups (AD/NL), age, and gender. Coinciding subjects of the X96

CE

cohort have been excluded before balancing. Figure 9 illustrates structure volumes, each normalized with respect to the

AC

corresponding mean volume of all subjects, for both cohorts. It can be seen that the structure volumes in the Alzheimer’s group are smaller than in the control

605

group, except for the ventricles, which is to be expected. Due to the large sample size, Student’s t-test revealed highly significant differences for all structures. Therefore, Table 12 summarizes the absolute effect size using Cohen’s D and reveals that the volumetric difference of the hippocampus, amygdala, as well 31

ACCEPTED MANUSCRIPT

as the ventricular volumes have a higher effect size, whereas structures like the 610

brainstem and the cerebellum do not seem to be largely affected by Alzheimer’s disease. These observations in fact confirm current clinical recommendations

CR IP T

(Frisoni et al., 2013). Furthermore, the discriminative power of the multi-variate structure model, including all volumes, has been statistically assessed by MANCOVA as available 615

in SPSS. Results yield a significant overall group difference (Wilks’ lambda = 0.488983, df = 19, F = 50.547866, p = 0.000), even without including age as

AC

CE

PT

ED

M

AN US

co-variate.

32

33

0.43 0.43 0.47 0.43 0.44

Philips AD

Siemens NL

Siemens AD

GE NL

GE AD

0.961

0.958

0.962

0.956

0.962

0.967

0.968

0.44

0.45

0.47 0.961

0.45

0.44

0.48

0.46

0.53

0.43

(mm)

mean

0.962

0.959

0.958

0.952

0.955

0.963

DSC

AN US

0.956

0.961

0.956

0.961

M

0.44

0.43

0.37

DSC

0.45

0.46

0.49

0.47

0.55

0.43

(mm)

mean

0.961

0.958

0.958

0.952

0.950

0.962

DSC

both diagn. groups

d) all vendors,

X96 data set: Evaluation of MBS with respect to vendor and diagnostic group. Separate models are trained on the respective subsets of

0.40

Philips NL

(mm)

mean

ED

DSC

same diagn. group

c) all vendors,

and the Dice similarity coefficient (DSC), both averaged over the images of the respective test set.

images from Philips, Siemens and GE. Evaluation results are given in terms of the mean segmentation error mean in mm, averaged over all triangles,

of all vendors and both diagnostic groups. Evaluations are performed separately on 6 subsets with 16 images each: Healthy controls and AD patient

each vendor, combining AD and Normal patients, c) on 48 images of each diagnostic group, combining images of each vendor, and d) on 96 images

CR IP T

the X96 data, a) on 16 images of each vendor and diagnostic groups, i.e. Alzheimer’s disease (AD) and healthy controls (NL), b) on 32 images of

Table 5:

(mm)

Evaluation on

mean

both diagn. groups

same diagn. group

PT b) same vendor,

a) same vendor,

CE

Training on

AC

ACCEPTED MANUSCRIPT

FSL

DSC

34 0.77

Average (all structures)



0.82



0.88

0.88



0.8

0.81



0.91 ∗

0.72 ∗

0.83

0.84

0.57

0.85 ∗†

0.90 ∗†

0.85 ∗

0.79 †

0.86 ∗†

0.94 ∗

0.88 ∗†

ED

0.83



0.71

MBS

M 2.84

0.79

1.01

0.79

1.57

2.13

0.60

0.18

17.62

0.89

3.01

0.68

FS



0.42



0.45

0.7

1.54

0.38

0.66 †

1.07 †

0.14 ∗†



0.66



0.82

0.8

0.87

0.81

0.88

0.90

0.95

0.99

0.74

0.84

0.97

0.89

0.96

0.59

MBS

0.86

0.78

0.93

0.94

0.85

1.00

0.81

0.81

0.98

0.79

0.97

0.50

FS

CR IP T

0.78

0.66

0.87

0.89

0.74

0.99

AN US

0.68



0.82



0.17

FSL





0.83

0.68

0.96

0.66

0.94

0.33

FS

0.65

0.58 ∗†

0.09

10.15

0.32 ∗

1.40 ∗

0.11 ∗†

MBS

ICC(3,1)

0.68

0.39

0.25 ∗†



0.46



0.29

FSL

AVD (ml)



0.76



0.9

0.89



0.79

0.81



0.9



0.29

FSL

α

0.93

0.89

0.93

0.95

0.97

1.00

0.85

0.91

0.98

0.94

0.98

0.74

MBS

AVD) to be statistically different from FSL FIRST (p < 0.05, Student’s t-test).

∗ marks MBS results (DSC, AVD) to be statistically different from FreeSurfer (p < 0.05, Student’s t-test). A dagger † marks MBS results (DSC,

Cronbach’s Alpha α and Pearsons’s correlation coefficient r as listed in Table 3. "—" refers to structures not supported by FSL FIRST. An asterisk

FSL FIRST, in terms of the Dice similarity coefficient (DSC), absolute volume difference (AVD) in ml, intra-class correlation coefficient (ICC(3,1)),

Table 6: OASIS MICCAI data set: Comparison of the segmentation performance of the model-based brain segmentation (MBS), FreeSurfer (FS) and

0.76

Average (FSL structures)

0.81

Lateral ventricle

0.53

0.79

Hippocampus

Third ventricle

0.74

Globus pallidus

0.83

0.91

Cereb. & ped.

Thalamus

0.81

Caud. nucl. & acc.

0.78

0.85

Brainstem

Putamen

0.58

PT

FS

CE

Amygdala

AC

0.80

0.70

0.91

0.89

0.82

0.99

0.68

0.68

0.96

0.66

0.96

0.44

FS



0.68



0.82

0.89



0.69

0.69



0.82



0.19

FSL

r

0.88

0.82

0.96

0.91

0.95

0.99

0.74

0.85

0.97

0.90

0.97

0.60

MBS

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT

DP\JGDOD EUDLQVWHP

CR IP T

FDXGDWHQXFOHXV DQGDFFXPEHQV FHUHEHOOXP DQGSHGXQFOHV

UHJLRQ

JOREXVSDOOLGXV KLSSRFDPSXV ODWHUDOYHQWULFOH

WKDODPXV WKLUGYHQWULFOH 





AN US

SXWDPHQ















'LFHVLPLODULW\FRHIILFLHQW '6&

DP\JGDOD

M

EUDLQVWHP FDXGDWHQXFOHXV DQGDFFXPEHQV

ED

UHJLRQ

JOREXVSDOOLGXV KLSSRFDPSXV

PT

ODWHUDOYHQWULFOH SXWDPHQ

CE

WKDODPXV

0HWKRG

0%6 )UHHVXUIHU )6/)LUVW

UHJLRQ

AC

WKLUGYHQWULFOH













FHUHEHOOXP DQGSHGXQFOHV 













$EVROXWHYROXPHGLIIHUHQFH $9'PO

Figure 3: OASIS MICCAI dataset: Comparative box-and-whisker plots for DSC and AVD. Whiskers start from the 25th and 75th percentile and range up to 1.5·IQR, where IQR denotes the inter-quartile range. A different AVD range justifies a separate plot for the cerebellum and peduncles.

35

FSL

DSC

36 0.77

Average (all structures)



0.66



0.71

0.7



0.63

0.66



0.90 ∗

0.73 ∗†

0.82

0.88

0.55 ∗

0.86 †

0.89 ∗†

0.81 ∗

0.78 †

0.86 ∗†

0.95 ∗

0.88 ∗†

ED

0.67



0.57

MBS

M 2.56

0.88

1.20

0.83

2.03

1.72

0.46

0.33

14.10

0.92

3.34

0.70

FS

0.32 ∗†

0.09 ∗†

7.46

0.39 ∗

2.21

0.10 ∗†

MBS

0.68

0.68

0.97

0.89

0.97

0.40

FS



0.51



0.59

1



1.31

0.34

0.57 ∗

0.89 †

0.24 ∗†

0.81

0.77

0.72

0.42

0.83

0.85

0.99

0.87

0.82

0.80

0.85

0.96

0.99

0.73

0.80

0.96

0.90

0.98

0.66

MBS

0.85

0.83

0.59

0.91

0.92

1.00

0.81

0.81

0.98

0.94

0.99

0.57

FS

α



0.68



0.73

0.73



0.55

0.79



0.9



0.36

FSL

CR IP T



0.54



0.58

0.58



0.38

0.66



0.82



0.22

FSL

ICC(3,1)

AN US

0.58

0.23



0.4



0.29

FSL

AVD (ml)

0.92

0.89

0.89

0.92

0.98

0.99

0.85

0.89

0.98

0.95

0.99

0.79

MBS

0.79

0.75

0.48

0.83

0.90

0.99

0.68

0.71

0.98

0.90

0.98

0.48

FS

AVD) to be statistically different from FSL FIRST (p < 0.05, Student’s t-test).

∗ marks MBS results (DSC, AVD) to be statistically different from FreeSurfer (p < 0.05, Student’s t-test). A dagger † marks MBS results (DSC,

Cronbach’s Alpha α and Pearsons’s correlation coefficient r as listed in Table 3. "—" refers to structures not supported by FSL FIRST. An asterisk

FSL FIRST, in terms of the Dice similarity coefficient (DSC), absolute volume difference (AVD) in ml, intra-class correlation coefficient (ICC(3,1)),

Table 7: OASIS-TRT-20 data set: Comparison of the segmentation performance of the model-based brain segmentation (MBS), FreeSurfer (FS) and

0.78

Average (FSL structures)

0.78

Lateral ventricle

0.45

0.78

Hippocampus

Third ventricle

0.80

Globus pallidus

0.86

0.93

Cereb. & ped.

Thalamus

0.83

Caud. nucl. & acc.

0.77

0.86

Brainstem

Putamen

0.62

PT

FS

CE

Amygdala

AC



0.61



0.6

0.77



0.48

0.68



0.82



0.28

FSL

r

0.87

0.82

0.83

0.85

0.96

0.99

0.73

0.80

0.97

0.91

0.98

0.68

MBS

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT

DP\JGDOD EUDLQVWHP

CR IP T

FDXGDWHQXFOHXV DQGDFFXPEHQV FHUHEHOOXP DQGSHGXQFOHV

UHJLRQ

JOREXVSDOOLGXV KLSSRFDPSXV ODWHUDOYHQWULFOH

AN US

SXWDPHQ WKDODPXV WKLUGYHQWULFOH 



















'LFHVLPLODULW\FRHIILFLHQW '6&

DP\JGDOD

M

EUDLQVWHP

UHJLRQ

JOREXVSDOOLGXV KLSSRFDPSXV

PT

ODWHUDOYHQWULFOH

ED

FDXGDWHQXFOHXV DQGDFFXPEHQV

SXWDPHQ

CE

WKDODPXV

0HWKRG

0%6 )UHHVXUIHU )6/)LUVW

UHJLRQ

AC

WKLUGYHQWULFOH





















FHUHEHOOXP DQGSHGXQFOHV

$EVROXWHYROXPHGLIIHUHQFH $9'PO

Figure 4: OASIS-TRT-20 dataset: Comparative boxplots for DSC and AVD. Whiskers start from the 25th and 75th percentile and range up to 1.5 · IQR, where IQR denotes the interquartile range. A different AVD range justifies a separate plot for the cerebellum and peduncles.

37

AN US

CR IP T

ACCEPTED MANUSCRIPT

Figure 5: Example scans of different cohorts, (left) X96, (middle) OASIS MICCAI, (right) OASIS-TRT-20. The snapshots show that scans from the OASIS cohorts sometimes suffer from flow artifacts, marked in red, in areas close to brain regions of interest (hippocampus,

AC

CE

PT

ED

M

amygdala).

Figure 6: “Stack effect” in region labels used as ground truth annotations. This effect is caused by slice-wise annotations of ground truth labels in coronal view.

38

Bland Altman Plot of Hippocampus segmentation NL MCI AD

1.0

CR IP T

0.8 0.6 0.4 0.2 0.0 0.2 0.4

AN US

Difference of ground truth segmentation to volume (ml)

ACCEPTED MANUSCRIPT

0.6 0.8 0.5

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Average volume of manual and automatic segmentation (ml)

Figure 7: Bland-Altman plot of hippocampal volumes as obtained by the model-based brain

M

segmentation (MBS) compared to volumes of manual, consolidated ground truth annotations,

ED

for 113 cases. Dashed lines illustrate the 95% confidence interval.

Results on HarP cohort

DSC

0.84

AVD (ml)

0.16

ICC(3,1)

0.93

α

0.96

r

0.93

p

0.97

AC

CE

PT

Evaluation measure

Table 8: Evaluation results for left and right hippocampus, when comparing automated MBS segmentations to ground truth labels of the HarP cohort. As an additional reference measure, the p-value of Student’s t-test is listed, showing an insignificant difference between MBS volumes and ground truth.

39

ACCEPTED MANUSCRIPT

MBS

FreeSurfer

MBS

CoVs (%)

CoVs (%)

CoVt (%)

CoVt (%)

Amygdala

4.69

1.92

5.21

2.09

Caudate

1.54

0.85

1.58

0.85

Hippocampus

2.77

1.18

2.92

Lateral ventricles

1.58

1.13

3.40

Pallidum

5.25

1.38

5.42

Putamen

4.04

0.88

3.92

Thalamus

5.98

1.14

6.06

Average

3.69

CR IP T

FreeSurfer

1.17 2.03 1.72 1.07

AN US

1.37

1.21

4.07

1.47

Table 9: Test-retest consistency on the Nature cohort: Coefficients of variation for follow-up assessment (intra-session CoVs and inter-session CoVt ), indicating relative changes in structure volumes over time. Results for FreeSurfer are given as reported in (Maclaren et al., 2014).

M

5.5 FS MBS FSL

5.0

AC

right thalamus

left thalamus

right putamen

left putamen

right hippocampus

left hippocampus

right globus pallidus

left globus pallidus

CE

left amygdala

2.5

left caudate nucleus and accumbens

3.0

PT

3.5

right caudate nucleus and accumbens

ED

4.0

right amygdala

Score

4.5

Figure 8: Comparative evaluation results of visual, interactive scoring done by a Neuroradiologist for every subcortical structure in n=30 scans.

40

CR IP T

ACCEPTED MANUSCRIPT

Score

Description

5

The structure was not under- or over-segmented, i.e. its

4

AN US

area could be accurately determined.

The structure was either slightly over- or slightly undersegmented (but not both) with an apparent overlap error of < 10%

3

(i) The structure was both, slightly over- and undersegmented with an apparent overlap error of < 10%, or

2

M

(ii) the overlap error was between 10% and 25%. (i) The structure was both, slightly over- and under-

ED

segmented with an apparent overlap error between 10% and 25%, or (ii) the overlap error was larger than 25%. The structure was not recognized (overlap error > 50%)

PT

1

Table 10: Scoring scheme for blind comparative visual assessment of segmentation results. Obviously, quantitative thresholds for separating different levels of segmentation errors were

CE

meant as a guidance for visual scoring, in which an overlap error can only be estimated by

AC

inspection.

41

CR IP T

ACCEPTED MANUSCRIPT

MBS

FSL

FS

MBS vs. FSL

MBS vs. FS

Left amygdala

4.0

3.8

3.9

0.1351

0.4113

Left caudate nucleus and accumbens

4.7

4.0

3.8

0.0002*

8.96E-11*

Left globus pallidus

4.3

3.9

3.4

0.0415*

6.05E-7*

Left hippocampus

4.4

3.7

3.8

0.0002*

0.0018*

Left putamen

4.4

4.2

3.3

0.0975

2.96E-9*

Left thalamus

4.1

3.6

3.3

1.5E-5*

6.8E-6*

Right amygdala

4.0

3.5

3.9

0.0171*

0.4136

Right caudate nucleus and accumbens

4.6

4.0

3.7

0.0019*

4.3E-08*

Right globus pallidus

4.3

3.9

3.5

0.0134*

1.7E-7*

Right hippocampus

4.4

3.9

4.0

0.0005*

0.0024*

Right putamen

4.6

4.0

3.4

0.0010*

1.1E-12*

Right thalamus

4.1

3.7

3.5

0.0005*

9.5E-6*

PT

ED

AN US

Structure

p-Value (Student’s t-test)

M

Average Score

Table 11: Average scores and p-values of Student’s t-test between two different techniques.

AC

CE

Statistically significant differences (p<0.05) are marked with an asterisk (*).

42

AC right amygdala

left caudate nucleus and accumbens

ED

left amygdala

cerebellum and peduncles

brainstem

Normalized volume 0.8

0.6

0.4

AN US

1.0

43 third ventricle

right lateral ventricle

left lateral ventricle

1.4

right thalamus

left thalamus

right putamen

left putamen

right hippocampus

left hippocampus

1.6

CR IP T

1.2

right globus pallidus

left globus pallidus

M

right caudate nucleus and accumbens

PT

CE

ACCEPTED MANUSCRIPT

3

2

1

0 Diagnosis

1 NL AD

Figure 9: Mean relative volumes of subcortical structures for two groups of the ADNL-ADNI

cohort: Alzheimer’s disease (Green, AD, n=390) and healthy controls (Blue, NL, n=390)

Region Left hippocampus

CR IP T

ACCEPTED MANUSCRIPT

| Cohen’s D | 1.53 1.37

Left amygdala

1.21

Right amygdala

1.03

Right lateral ventricle

0.93

Left lateral ventricle

0.92

Third ventricle

0.77

Left thalamus

0.68

Right thalamus

0.64

M

AN US

Right hippocampus

0.59

Left caudate nucleus and accumbens

0.59

Right putamen

0.57

Right caudate nucleus and accumbens

0.53

Right globus pallidus

0.33

Left globus pallidus

0.32

Brainstem

0.23

Cerebellum and peduncles

0.17

CE

PT

ED

Left putamen

Table 12: Absolute effect size (Cohen’s D) on volumetric differences between n=390 Alzheimer

AC

patients and n=390 healthy controls from the ADNI database.

44

ACCEPTED MANUSCRIPT

4. Summary and Conclusion This work presents a novel approach for the rapid segmentation of subcortical 620

structures in T1-weighted MRI brain scans. It is based on a shape-constrained

CR IP T

deformable surface model which has been trained on 96 scans of healthy elderly subjects as well as patients suffering from Alzheimer’s disease, from three

different scanner manufacturers. The structure definitions of subcortical brain

regions mostly adhere to the guidelines from the center of morphometric analysis 625

(CMA), which allows a comparison to the FSL FIRST and FreeSurfer software packages, since both have reported to comply with the CMA guidelines as well.

AN US

Comprehensive evaluation of the model-based brain segmentation approach

on six different datasets, containing a total number of 1072 subjects and 1189 3D T1-weighted, MP-RAGE-type brain scans, demonstrate that a single segmenta630

tion model is not only robust across vendors and MRI field strengths but also against atrophy-related appearance changes, which might be caused by neuro-

M

degenerative diseases. In a comparative experiment using publicly available ground truth segmentations, superior segmentation performance to FSL FIRST and FreeSurfer could be shown for a number of different evaluation measures, including the Dice similarity coefficient (DSC), intra-class correlation coeffi-

ED

635

cient (ICC), and absolute volume difference (AVD). Furthermore, comparison of MBS segmentation results to consolidated, high-quality ground truth labels

PT

of the hippocampus as provided by an EANM-ADNI initiative on n=113 scans revealed insignificant volumetric differences as well as an intraclass correlation coefficient surpassing published agreement measures between trained experts.

CE

640

Test-retest consistency of the model-based brain segmentation on a set of

follow-up scans was assessed by the coefficient of variability between baseline

AC

and follow-up scans and showed a high level of consistency. Similarly encouraging results could be observed when comparing accuracy scores between differ-

645

ent techniques in a blind visual rating study performed by a Neuroradiologist: Here, the model-based brain segmentation was consistently ranked as best segmentation technique. Finally, a study focused on volumetric differences of brain 45

ACCEPTED MANUSCRIPT

structures in n=380 AD subjects and balanced, age-matched healthy controls could confirm that disease-related atrophy can be found in a number of subcor650

tical structures, with amygdala and hippocampus providing the most relevant

CR IP T

volumetric biomarkers for Alzheimer’s disease. Besides highly accurate segmentation results, a major advantage of the presented approach is processing speed, which on current hardware is as low as about 60 seconds per scan. This can be explained by an efficient mesh ini655

tialization based on the Generalized Hough Transform which only requires a few seconds of processing time and eliminates the need for initial registration

AN US

steps, as it is done in FSL FIRST or FreeSurfer. On the other hand, in order to increase robustness and processing speed, the model-based segmentation framework uses trained triangle-specific boundary detection functions and al660

lows hierarchical adaptation so that parametric adaptation is only needed for an intermediate low-resolution mesh. This way, processing time still outperforms the one of a recently reported deep learning based segmentation approach (Dolz

M

et al., 2017). The comparably low runtime not only allows fast batch-processing of large cohorts for research, it also enables analysis in a clinical environment in which, due to practical reasons, quantification results are often required to be

ED

665

available in not more than a few minutes. Finally, since the presented approach is based on the adaptation of a 3D

PT

surface mesh, it has the potential to be intuitively corrected in 3D by interactive mesh editing. This functionality might be useful in case of sub-optimal results 670

for severely abnormal cases in which fully automatic segmentation approaches

CE

generally fail.

AC

5. Acknowledgements The authors would like to thank Dr. Jürgen Weese for fruitful discussions

on this work and the anonymous reviewers for many valuable comments.

675

This project has received funding from the European Union’s Seventh Frame-

work Programme for research, technological development and demonstration 46

ACCEPTED MANUSCRIPT

under grant agreement no. 601055 (VPH-DARE@IT). Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-

CR IP T

680

12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous con-

tributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol685

Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuti-

AN US

cals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.;

Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; 690

Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal

M

Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support

695

ED

ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education,

PT

and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the

AC

CE

Laboratory for Neuro Imaging at the University of Southern California.

47

ACCEPTED MANUSCRIPT

700

References References Artaechevarria, X., Munoz-Barrutia, A., de Solorzano, C. O., Aug. 2009. Com-

CR IP T

bination strategies in multi-atlas image segmentation: Application to brain mr data. IEEE Trans. Medical Imaging 28 (8), 1266–1277. 705

Babalola, K. O., Cootes, T. F., Twining, C. J., Petrovic, V., Taylor, C. J., 2008.

3d brain segmentation using active appearance models and local regressors.

In: Metaxas, D., Axel, L., Fichtinger, G., Szekely, G. (Eds.), Proc. MICCAI.

AN US

Vol. 5241 of LNCS. Springer, pp. 401–408.

Ballard, D. H., 1981. Generalizing the Hough transform to detect arbitrary shapes. Pattern Recogn. 13 (2), 111–122.

710

Boccardi, M., Bocchetta, M., Apostolova, L. G., Barnes, J., Bartzokis, G., Corbetta, G., DeCarli, C., deToledo Morrell, L., Firbank, M., Ganzola, R.,

M

Gerritsen, L., Henneman, W., Killiany, R. J., Malykhin, N., Pasqualetti, P., Pruessner, J. C., Redolfi, A., Robitaille, N., Soininen, H., Tolomeo, D., Wang, L., Watson, C., Wolf, H., Duvernoy, H., Duchesne, S., Jack, C. R., Frisoni,

ED

715

G. B., Feb 2015. Delphi definition of the EADC-ADNI Harmonized Protocol for hippocampal segmentation on magnetic resonance. Alzheimers Dement

PT

11 (2), 126–138.

Canny, J., Nov. 1986. A computational approach to edge detection. IEEE Trans. Pattern Analysis and Machine Intelligence 8 (6), 679–698.

CE

720

Cardoso, M. J., Clarkson, M. J., Ridgway, G. R., Modat, M., Fox, N. C.,

AC

Ourselin, S., Initiative, T. A. D. N., 2011. Load: a locally adaptive cortical segmentation algorithm. NeuroImage 56, 1386–1397.

Chupin, M., Hammers, A., Liu, R. S. N., Colliot, O., Burdett, J., Bardinet, E.,

725

Duncan, J. S., 2009. Automatic segmentation of the hippocampus and the amygdala driven by hybrid constraints: method and validation. NeuroImage 46, 749–761. 48

ACCEPTED MANUSCRIPT

Cootes, T. F., Taylor, C. J., Cooper, D. H., Graham, J., Jan. 1995. Active shape models – their training and application. Computer Vision and Image Understanding 61 (1), 38–59.

730

CR IP T

Corso, J. J., Tu, Z., Yuille, A., Toga, A., 2007. Segmentation of sub-cortical structures by the graph-shifts algorithm. In: Information Processing in Medical Imaging. Vol. 4584 of LNCS. Springer, pp. 183–197.

Doan, N. T., de Xivry, J. O., Macq, B., 2010. Effect of inter-subject variation on

the accuracy of atlas-based segmentation applied to human brain structures.

735

AN US

In: Dawant, B. M., Haynor, D. R. (Eds.), Proc. SPIE 7623, Medical Imaging 2010: Image Processing. Vol. 7623. pp. 76231S–1 – 76231S–10.

Dolz, J., Desrosiers, C., Ayed, I. B., 2017. 3d fully convolutional network for subcortical segmentation in mri: A large-scale study. NeuroImage, in press. 740

Dolz, J., Massoptier, L., Vermandel, M., June 2015. Segmentation algorithms

M

of subcortical brain structurese on mri for radiotherapy and radiosurgery: A survey. IRBM 36, 200–212.

ED

Ecabert, O., Peters, J., Schramm, H., Lorenz, C., von Berg, J., Walker, M. J., Vembar, M., Olszewski, M. E., Subramanyan, K., Lavi, G., Weese, J., Sep. 2008. Automatic model-based segmentation of the heart in CT images. IEEE

745

PT

Trans. Medical Imaging 27 (9), 1189–1201. Ecabert, O., Peters, J., Walker, M., Ivanc, T., Lorenz, C., v. Berg, J., Lessick,

CE

J., Vembar, M., Weese, J., 2011. Segmentation of the heart and great vessels in CT images using a model-based adaptation engine. Medical Image Analysis 15 (6), 863–876.

AC

750

Ecabert, O., Peters, J., Walker, M., von Berg, J., Lorenz, C., Vembar, M., Subramanyan, K., Weese, J., Mar. 2007. Automatic full heart segmentation in CT images: Method and validation. In: Pluim, J. P. W., Reinhardt, J. M. (Eds.), Proc. SPIE Medical Imaging. Vol. 6512. pp. 65120G–1 – 65120G–12.

49

ACCEPTED MANUSCRIPT

755

Ecabert, O., Peters, J., Weese, J., Mar. 2006. Modeling shape variability for full heart segmentation in cardiac CT images. In: Reinhardt, J. M., Pluim, J. P. W. (Eds.), Proc. SPIE Medical Imaging. Vol. 6144. pp. 61443R–1 –

CR IP T

61443R–12. Ellingson, B. M., Bendszus, M., Boxerman, J., Barboriak, D., Erickson, B. J.,

Smits, M., Gilbert, M. R., 2015. Consensus recommendations for a standard-

760

ized brain tumor imaging protocol in clinical trials. Neuro-Oncology 17 (9), 1188—-1198.

AN US

Fischl, B., 2012. Freesurfer. NeuroImage 62, 774–781.

Fischl, B., Salat, D. H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., van der Kouwe, A., Killiany, R., Kennedy, D., Klaveness, S., Montillo, A.,

765

Makris, N., Rosen, B., Dale, A. M., January 2002. Whole brain segmentation: Automated labeling of neuroanatomical structures in the human brain.

M

Neuron 33 (3), 341–355.

Frisoni, G. B., Bocchetta, M., Chételat, G., Rabinovici, G. D., de Leon, M. J., Kaye, J., Reiman, E. M., Scheltens, P., Barkhof, F., Black, S. E., Brooks,

770

ED

D. J., Carrillo, M. C., Fox, N. C., Herholz, K., Nordberg, A., Jack, C. R., Jagust, W. J., Johnson, K. A., Rowe, C. C., Sperling, R. A., Thies, W., Wahlund, L.-O., Weiner, M. W., Pasqualetti, P., DeCarli, C., Jul. 2013. Imag-

PT

ing markers for Alzheimer disease. Neurology 81 (5), 487–500. URL http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3776529/

775

CE

Frisoni, G. B., Jack, C. R., Bocchetta, M., Bauer, C., Frederiksen, K. S., Liu, Y.,

AC

Preboske, G., Swihart, T., Blair, M., Cavedo, E., Grothe, M. J., Lanfredi, M.,

780

Martinez, O., Nishikawa, M., Portegies, M., Stoub, T., Ward, C., Apostolova, L. G., Ganzola, R., Wolf, D., Barkhof, F., Bartzokis, G., DeCarli, C., Csernansky, J. G., deToledo Morrell, L., Geerlings, M. I., Kaye, J., Killiany, R. J., Lehericy, S., Matsuda, H., O’Brien, J., Silbert, L. C., Scheltens, P., Soininen, H., Teipel, S., Waldemar, G., Fellgiebel, A., Barnes, J., Firbank, M., Gerritsen, L., Henneman, W., Malykhin, N., Pruessner, J. C., Wang, L., Watson, 50

ACCEPTED MANUSCRIPT

C., Wolf, H., deLeon, M., Pantel, J., Ferrari, C., Bosco, P., Pasqualetti, P., Duchesne, S., Duvernoy, H., Boccardi, M., Albert, M. S., Bennet, D., Cam-

785

icioli, R., Collins, D. L., Dubois, B., Hampel, H., denHeijer, T., Hock, C.,

CR IP T

Jagust, W., Launer, L., Maller, J. J., Mueller, S., Sachdev, P., Simmons, A., Thompson, P. M., Visser, P. J., Wahlund, L. O., Weiner, M. W., Winblad, B., Feb 2015. The EADC-ADNI Harmonized Protocol for manual hippocampal

segmentation on magnetic resonance: evidence of validity. Alzheimers Dement

790

11 (2), 111–125.

González-Villà, S., Oliver, A., Valverde, S., Wang, L., Zwiggelaar, R., Lladó,

AN US

X., 10 2016. A review on brain structures segmentation in magnetic resonance imaging. Artificial Intelligence in Medicine 73, 45–69. 795

Grimm, O., Pohlack, S. T., Cacciaglia, R., Winkelmann, T., Plichta, M. M., Demirakca, T., Flor, H., 09 2015. Amygdala and hippocampal volume: A comparison between manual segmentation, freesurfer and vbm. Journal of

M

Neuroscience Methods 253, 254–261.

Heckemann, R. A., Hajnal, J. V., Aljabar, P., Rueckert, D., Hammers, A., 2006. Automatic anatomical brain mri segmentation combining label propagation

ED

800

and decision fusion. NeuroImage 33, 115–126. Iglesias, J. E., Sabuncu, M. R., 2015. Multi-atlas segmentation of biomedical

PT

images: A survey. Medical Image Analysis 24 (1), 205–219.

CE

Jenkinson, M., 2014. https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FIRST. 805

Kass, M., Witkin, A., Terzopoulos, D., 1988. Snakes: Active contour models. Int. Journal of Computer Vision 1 (4), 321–331.

AC

Klein, A., Mensh, B., Ghosh, S., Tourville, J., Hirsch, J., 2005. Mindboggle: Automated brain labeling with multiple atlases. BMC Medical Imaging 5, 7.

Landman, B. A., Warfield, S. K., 2012. MICCAI 2012: Workshop on Multi-

810

Atlas Labeling. URL https://masi.vuse.vanderbilt.edu/workshop2012 51

ACCEPTED MANUSCRIPT

Lötjönen, J., Wolz, R., Koikkalainen, J., Julkunen, V., Thurfjell, L., Lundqvist, R., Waldemar, G., Soininen, H., Rueckert, D., May 2011. Fast and robust extraction of hippocampus from mr images for diagnostics of alzheimer’s disease. NeuroImage 56 (1), 185–196.

CR IP T

815

Lötjönen, J. M., Wolz, R., Koikkalainen, J. R., Thurfjell, L., Waldemar, G.,

Soininen, H., Rueckert, D., 2010. Fast and robust mutli-atlas segmentation of brain magnetic resonance images. NeuroImage 49 (3), 2352–2365.

Maclaren, J., Han, Z., Vos, S. B., Fishbein, N., Bammer, R., 2014. Reliability of brain volume measurements: A test-retest dataset. Scientific Data.

AN US

820

Morra, J. H., Tu, Z., Apostolova, L. G., Green, A. E., Avedissian, C., Madsen, S. K., Parikshak, N., Hua, X., Toga, A. W., Jr., C. R. J., Weiner, M. W., Thompson, P. M., 2008. Validation of a fully automated 3d hippocampal segmentation method using subjects with alzheimer’s disease mild cognitive impairment, and elderly controls. NeuroImage 43 (1), 59–68.

825

M

URL http://dx.doi.org/10.1016/j.neuroimage.2008.07.003 Murphy, S., Mohr, B., Fushimi, Y., Yamagata, H., Pole, I., 2014. Fast, simple,

ED

accurate multi-atlas segmentation of the brain. In: Ourselin, S., Modat, M. (Eds.), Biomedical Image Registration, 6th International Workshop, WBIR. Vol. 8545 of LNCS. Springer, pp. 1–10.

PT

830

Patenaude, B., Smith, S. M., Kennedy, D. N., Jenkinson, M., 2011. A bayesian

CE

model of shape and appearance for subcortical brain segmentation. NeuroImage 56 (3), 907–922.

AC

Peters, J., Ecabert, O., Meyer, C., Kneser, R., Weese, J., 2010. Optimizing

835

boundary detection via simulated search with applications to multi-modal heart segmentation. Medical Image Analysis 14 (1), 70–84.

Peters, J., Ecabert, O., Meyer, C., Schramm, H., Kneser, R., Groth, A., Weese, J., 2007. Automatic whole heart segmentation in static magnetic resonance

52

ACCEPTED MANUSCRIPT

image volumes. In: Ayache, N., Ourselin, S., Maeder, A. (Eds.), Proc. MICCAI. Vol. 4792 of LNCS. Springer, pp. 402–410.

840

Scherrer, B., Forbes, F., Garbay, C., Dojat, M., 2008. Fully bayesian joint

CR IP T

model for mr brain scan and structure segmentation. In: Metaxas, D., Axel, L., Fichtinger, G., Szekely, G. (Eds.), Proc. MICCAI. Vol. 5242 of LNCS. Springer, pp. 1066–1074. 845

Shrout, P. E., Fleiss, J. L., 1979. Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin 86 (2), 420–428.

AN US

Stevens, A., 2017. https://surfer.nmr.mgh.harvard.edu/fswiki.

Tang, X., Crocetti, D., Kutten, K., Ceritoglu, C., Albert, M. S., Mori, S., Mostofsky, S. H., Miller, M. I., 03 2015. Segmentation of brain magnetic resonance images based on multi-atlas likelihood fusion: testing using data with a

850

broad range of anatomical and photometric profiles. Frontiers in Neuroscience

M

9 (61), 1–13.

Tavakol, M., Dennick, R., Jun. 2011. Making sense of Cronbach’s alpha. Inter-

ED

national Journal of Medical Education 2, 53–55. URL http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4205511/

855

PT

Tu, Z., Narr, K. L., Dollar, P., Dinov, I., Thomson, P. M., Toga, A. W., April 2008. Brain anatomical structure segmentation by hybrid discrimina-

CE

tive/generative models. IEEE Trans. Medical Imaging 27 (4), 495–508. Wang, H., Yushkevich, P. A., 11 2013. Multi-atlas segmentation with joint label fusion and corrective learning—an open source implementation. Frontiers in

AC

860

Neuroinformatics 7.

Wang, J., Vachet, C., Rumple, A., Gouttard, S., Ouziel, C., Perrot, E., Du,

865

G., Huang, X., Gerig, G., Styner, M., 02 2014. Multi-atlas segmentation of subcortical brain structures via the autoseg software pipeline. Frontiers in Neuroinformatics 8, 7–. 53

ACCEPTED MANUSCRIPT

Warfield, S. K., Zou, K. H., Wells, W. M., 07 2004. Simultaneous truth and performance level estimation (staple): an algorithm for the validation of image segmentation. IEEE Trans. Medical Imaging 123 (7), 903–921.

CR IP T

Weese, J., Kaus, M. R., Lorenz, C., Lobregt, S., Truyen, R., Pekar, V., June

2001. Shape constrained deformable models for 3D medical image segmen-

870

tation. In: Insana, M. F., Leahy, R. M. (Eds.), Information Processing in Medical Imaging. Vol. 2082 of LNCS. Springer, pp. 380–387.

Weese, J., Wächter-Stehle, I., Zagorchev, L., Peters, J., 2014. Shape Anal-

AN US

ysis in Medical Image Analysis. No. 14 in Lect. Notes Comp. Vision and

Biomech. Springer, Ch. Shape-Constrained Deformable Models and Applica-

875

tions in Medical Imaging, pp. 151–184.

Wolz, R., Aljabar, P., Hajnal, J. V., Hammers, A., Rueckert, D., January 2010. Leap: learning embeddings for atlas propagation. NeuroImage 49 (2), 1316–

880

M

1325.

Yan, Z., Zhang, S., Liu, X., Metaxas, D. M., Montillo, A., April 2013. Accurate segmentation of brain images into 34 structures combining non-stationary

ED

adaptive statistical atlas and a multi-atlas with applications to alzheimer’s disease. In: Biomedical Imaging (ISBI), 2013 IEEE 10th International Sym-

885

PT

posium on. pp. 1202–1205.

Zagorchev, L., Meyer, C., Stehle, T., Wenzel, F., Young, S., Peters, J., Weese, J.,

CE

Paulsen, K., Garlinghouse, M., Ford, J., Roth, R., Flashman, L., McAllister, T., December 2015. Differences in regional brain volumes two months and one

AC

year after mild traumatic brain injury. Journal of Neurotrauma 33 (1), 29–34.

54