Segmentation of MRI brain scans using non-uniform partial volume densities

Segmentation of MRI brain scans using non-uniform partial volume densities

NeuroImage 49 (2010) 467–477 Contents lists available at ScienceDirect NeuroImage j o u r n a l h o m e p a g e : w w w. e l s e v i e r. c o m / l ...

686KB Sizes 0 Downloads 25 Views

NeuroImage 49 (2010) 467–477

Contents lists available at ScienceDirect

NeuroImage j o u r n a l h o m e p a g e : w w w. e l s e v i e r. c o m / l o c a t e / y n i m g

Segmentation of MRI brain scans using non-uniform partial volume densities Rachel M. Brouwer ⁎, Hilleke E. Hulshoff Pol, Hugo G. Schnack Rudolf Magnus Institute of Neuroscience, Department of Psychiatry, University Medical Center Utrecht, The Netherlands

a r t i c l e

i n f o

Article history: Received 30 December 2008 Revised 17 July 2009 Accepted 17 July 2009 Available online 25 July 2009 Keywords: MRI Human brain Segmentation Partial volume effect Non-uniform distribution Reliability Gray–white separation

a b s t r a c t We present an algorithm that provides a partial volume segmentation of a T1-weighted image of the brain into gray matter, white matter and cerebrospinal fluid. The algorithm incorporates a non-uniform partial volume density that takes the curved nature of the cortex into account. The pure gray and white matter intensities are estimated from the image, using scanner noise and cortical partial volume effects. Expected tissue fractions are subsequently computed in each voxel. The algorithm has been tested for reliability, correct estimation of the pure tissue intensities on both real (repeated) MRI data and on simulated (brain) images. Intra-class correlation coefficients (ICCs) were above 0.93 for all volumes of the three tissue types for repeated scans from the same scanner, as well as for scans with different voxel sizes from different scanners with different field strengths. The implementation of our non-uniform partial volume density provided more reliable volumes and tissue fractions, compared to a uniform partial volume density. Applying the algorithm to simulated images showed that the pure tissue intensities were estimated accurately. Variations in cortical thickness did not influence the accuracy of the volume estimates, which is a valuable property when studying (possible) group differences. In conclusion, we have presented a new partial volume segmentation algorithm that allows for comparisons over scanners and voxel sizes. © 2009 Elsevier Inc. All rights reserved.

1. Introduction Brain tissue segmentation from magnetic resonance (MR) scans is often complicated by the existence of partial volume voxels that contain a mixture of two or more tissue types. The assignment of these voxels to a particular tissue type may result in biased volumetric measures. Furthermore, binary segmentations may fail to detect small or local changes in the brain, especially when considering repeated scans or longitudinal data. It has been shown that an optimal binary segmentation will have approximately 20 volume percent of tissue assigned to the wrong tissue type (Niessen, 1997). Since the amount of misclassification clearly depends on voxel size, a binary segmentation is also not suited to combine samples consisting of scans with different voxel sizes. A partial volume segmentation allows voxels to belong to more than one tissue class. Such segmentations are more sensitive to local changes, and provide volumes that are more accurate, since attributing voxels to a pure tissue class in a binary segmentation can introduce rounding “errors” that may not cancel each other out. Partial volume segmentations are also able to segment structures such as the cerebellum, which is not well possible using a binary segmentation due to its fine structure (with respect to the typical voxel size of 1 mm3).

⁎ Corresponding author. Department of Psychiatry, University Medical Center Utrecht, House A.01.126, PO Box 85500, 3508 GA Utrecht, The Netherlands. Fax: +31887555443. E-mail address: [email protected] (R.M. Brouwer). 1053-8119/$ – see front matter © 2009 Elsevier Inc. All rights reserved. doi:10.1016/j.neuroimage.2009.07.041

Many algorithms have been proposed to create partial volume segmentations in the brain, (see Bach Cuadra et al. (2005) for an overview). Some (histogram based) methods incorporate separate partial volume classes as Gaussian distributions. Voxels assigned to these mixed classes are assumed to consist of 50% of both tissue types in the mixed class (Santago and Gage, 1993; Grabowski et al., 2000) or are subsequently assigned to the pure voxel classes based on the histogram (Kovacevic et al., 2002) or neighbor information (Markov priors) (Ruan et al., 2000). Others use training sets (Song et al., 2006) to estimate the tissue fraction per voxel, or templates (Ashburner and Friston, 1997) that provide probabilities that voxels belong to a certain tissue class. A large range of methods incorporate Markov Random Fields (MRF), or other types of neighbor information to estimate parameter values and/or assign estimates of the tissue fractions (Leemput et al., 2003; Choi et al., 1991; Li et al., 2005). Both Tohka et al. (2004) and Shattuck et al. (2001) first determine the partial volume class with MRF and subsequently estimate tissue fractions based on the pure tissue intensities. In Laidlaw et al. (1998), neighbor information is incorporated by fitting a local histogram to determine tissue fractions. Bricq et al. (2008) incorporate a Hidden Markov Chain model that takes neighbor information into account. Anbeek et al. (2005) introduced a multi-channel algorithm that provides probability maps for five tissue types based on K-nearestneighbor classification. Manjón et al. (2008) have a different approach and exclude voxels that are likely to contain more than one tissue type, in part determined by neighbor information, to estimate tissue parameters more accurately. In the heavily curved cortex, voxel content may change from voxel to voxel. Therefore, we aim for a

468

R.M. Brouwer et al. / NeuroImage 49 (2010) 467–477

partial volume segmentation that does not use neighbor information when assigning the tissue fractions. The method we propose incorporates two novel ideas: the implementation of a non-uniform partial volume density and a new method of estimating the pure gray and white matter intensities from the image. The non-uniform partial volume density was chosen as a more realistic approach of the real partial volume density, to improve accuracy of the segmentation. Our new parameter estimation technique was developed to be robust in the presence of partial volume effects and noise. All histogram based segmentations assume (implicitly) that the partial volume density is uniform, in the sense that all possible compositions of mixed tissues are equally likely. However, this uniformity assumption does not take into account that the interface between tissues in the brain is curved. Because of this curvature, it is very well possible that the partial volume voxels are distributed otherwise, possibly U-shaped, as has been suggested by González Ballester et al. (2002), Röll et al. (1994) and Bello et al. (1998). Here we present a histogram based partial volume segmentation algorithm that incorporates a non-uniform partial volume density. Several methods estimate tissue parameters and voxel labels simultaneously by an expectation optimization (EM) algorithm or maximum a posteriori (MAP) estimator (Bricq et al., 2008; Choi et al., 1991; Kovacevic et al., 2002, Leemput et al., 2003; Ruan et al., 2000; Shattuck et al., 2001; Zhang et al., 2001). Further, genetic (Schroeter et al., 1998; Tohka et al., 2007) and tree-annealing (Santago and Gage, 1993) algorithms have been proposed for the estimation of tissue parameters. Partial voluming and scanner noise are generally considered to be a problem for accurate estimation of tissue parameters. Cocosco et al. (2003), Tohka et al. (2004) and Manjón et al. (2008) all incorporate an outlier rejection scheme for a more accurate estimation of pure tissue parameters. Our method is different from the ones mentioned above because it uses noise properties and the existence of partial volume voxels. Further, whereas many methods first detect partial volume voxels and subsequently assign tissue fractions to voxels labeled as belonging to a mixed class, (e.g. Grabowski et al., 2000; Ruan et al., 2000; Shattuck et al., 2001; Tohka et al., 2004) we consider pure tissue as a special case of partial voluming, and provide tissue fractions between 0 and 1 for each voxel and each tissue directly. The algorithm is tested on various both real (repeated) scans and artificial images, testing reliability of the algorithm, its ability to handle scans from different scanners and different voxel sizes and the influence of cortical thickness on the accuracy of the segmentation. 2. Methods and materials The model we propose segments the intracranium on T1-weighted scans into three tissues: gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF). We assume that the scans have been corrected for field inhomogeneity. We assume that the intensities of all three pure tissue types are normally distributed with different means μc, μg and μw, and the same variance σ. The variance σ arises from natural variation within a tissue type, plus a noise component. Although the variation that arises from scanner noise is strictly speaking Rician, many authors have adopted this assumption (e.g. Kovacevic et al., 2002; Ruan et al., 2000; Schroeter et al., 1998) and it has been shown that the Rician distribution approximates a Gaussian distribution if the signal to noise ratio is large enough (Gudbjartsson and Patz, 1995), which is usually the case for T1-weighted scans. The algorithm presented in the following sections comprises several steps. In short, in the first step, the mean pure tissue intensities for gray and white matter are estimated from the image, using noise and partial volume effects present in the image. These estimators are described in Subsection 2.1. The algorithm incorporates a non-uniform partial volume density function, which is

introduced in Subsection 2.2. The intensity histogram is modelled by this partial volume density and the densities of the pure tissue intensities (Subsection 2.3). The theoretical intensity histogram will be fitted to the intensity histogram obtained from the image, providing the mean intensity of CSF, the variance of the pure tissue intensities and the parameters of the partial volume density (Subsection 2.4). Expected tissue fractions will be computed for each tissue type in each voxel (Subsection 2.3). 2.1. Estimation of the pure matter intensities In the first step of our algorithm we estimate the pure white matter intensity and pure gray matter intensity from the image. From the intensity histogram, we determine a crude range of intensities Lw, which will act as a range of estimates for the pure white matter intensity μw. We used a method similar as the one described in Schnack et al. (2001), but any (automatic) method that detects the right-most peak in the histogram can be used, around which an interval at e.g., 0.95 and 1.05 times this intensity can be defined. For each Lw in this range, we select all face-sharing, neighboring voxel pairs (i,j) in the image such that voxel i has intensity L[i] ≥ Lw and its neighbor j has intensity L[j] ≤ Lw. We consider the histograms of the absolute differences ΔL = ΔL[i,j] = L[i] − L[j] of all selected voxel pairs, for each choice of Lw. In pure white matter, we may write L[i] = μw + ɛ, with ɛ ∼N (0,σ), and similarly for voxel j. The distribution p(ΔL) of the absolute intensity differences averaged over the noise becomes 2

pðΔLÞ =

e−u 2πσ

Z

Lw − μ w σ

Lw − μ w σ

+ u

− u

e

− e2

de;

ð1Þ

ΔL , see Appendix A, Eq. (8). The symmetry of Eq. (1) in Lw with u = 2σ shows that for any ΔL, taking Lw = μw gives the highest value of p(ΔL). Equivalently, Lw = μw is the value at which p′(ΔL) = 0. In practice, for Lw close to μw, we will sample pairs of voxels such that one is pure white matter and the other is a mixture of gray and white matter. These pairs contribute large intensity differences to the histogram which will be spread out over the total range of intensity differences. Further, the pure white matter area is larger than the gray–white matter boundary. Hence, these pairs will not influence the height of the histogram around the mode value. Fig. 1 (inset, right) shows a typical example of the height at the mode of the absolute intensity differences histogram for different values of Lw. The estimation of the pure gray matter intensity follows the same intuition, but is a bit more complicated, since pure gray matter voxels are not abundant in the image. As above, we choose intensity values Lg within a reasonable range of estimates for the pure gray matter intensity μg, i.e. far from the pure white and CSF intensities. Subsequently we select neighboring voxel pairs (i,j) such that L[i] ≥ Lg and L[j] ≤ Lg. Finally, we create histograms of absolute intensity differences ΔL of all selected voxel pairs. The gray matter layer is usually too thin to assume that we can find many neighboring voxel pairs such that both have true intensity μg (plus noise). Therefore, we should not only average over the noise, as we did for the white matter, but also over “space” in the sense that we should average over all possible “true” voxel values that are a mixture of gray matter with either white matter or CSF, see Appendix A, Eq. (8) and further. It is not possible to find an analytical solution for the mode value of the distribution of intensity differences, i.e. for p′(ΔL) = 0. However, a numerical approximation shows that the mode value of the intensity difference histogram is smallest when Lg = μg + δ, where the error term δ (which depends on σ, the difference in intensity contrasts (μ g − μ c ) and (μ w − μ g ) and the cortical thickness with respect to voxel size) usually is less than 2% of the gray–white contrast, see Appendix A, Eq. (9). Fig. 1 (inset, left) shows a typical

R.M. Brouwer et al. / NeuroImage 49 (2010) 467–477

469

2.2. The density of the partial volume fractions

Fig. 1. The lower part of the figure shows a typical intensity histogram of a T1-weighted image (see Validation for scan parameters; the 33-year old healthy subject is part of test set 1). The three solid bell-shaped curves represent the three pure tissue densities, as determined by the fit algorithm. The two dashed curves represent the two partial volume densities (CSF/GM and GM/WM). Inset: on the left the mode value of the histograms of absolute intensity differences for each choice of Lg, on the right the height of the histograms of absolute intensity differences at the mode value for each choice of Lw. The upper part of the figure displays the expectation values of the three tissue fractions xc, xg, xw, for each intensity value L in the image. The solid lines represent expected fractions E(xc|L), E(xg|L) and E(xw|L) based on expectation values; the dashed lines represent fraction x̂ c ðLÞ, x̂ g ðLÞ and x̂ w ðLÞ, based on linear interpolation between pure tissue intensities.

example of the mode of the absolute intensity differences for different values for Lg. Usually there is too little pure CSF present in the image to adopt a similar approach as above for estimating the mean pure CSF intensity. The mean pure CSF intensity will be fitted from the histogram.

In our model there are two types of partial volume voxels: voxels that are a mixture of gray matter and CSF, and voxels that are a mixture of gray matter and white matter. A voxel is attributed to one of these mixtures, based on whether its intensity is smaller or larger than the mean pure gray matter intensity. For both mixtures, let xg ∈ [0,1] represent the gray matter fraction of a voxel. To model voxels on the CSF-GM interface, we propose a partial volume density function ncg(xg) with derivative n′cg(xg) ≥ 0, for the mixed CSF-GM voxels. Likewise, to model the voxels on the GMWM interface, we propose ngw(xg) with n′gw(xg) ≥ 0, for the mixed GM-WM voxels. For the three pure tissue types, let nc, ng, and nw denote the number of pure CSF, gray matter and white matter voxels. We use Dirac's delta function (Kreyszig, 1993), to model the number of voxels containing only one tissue type. The number of voxels containing only pure tissue can now be modelled as ncδ(xg), ngδ(1 − xg) and nwδ(xg) for pure CSF, pure gray matter and pure white matter, respectively. See Fig. 2 for a schematic representation of the partial volume density. To see the intuition behind this proposal, we concentrate first on the CSF-GM boundary. If the surface of the brain was flat, assuming a uniform distribution of partial volume voxels would be correct. Since the cortex is heavily curved and the gyri are close to each other, a partial volume density that takes this geometry into account may be more suitable. Assuming a usual voxel size (1–2 mm), opposite walls of adjacent gyri may fall within one voxel, see Fig. 3. Intuitively, this leads to an excess of voxels that contain a relatively large amount of gray matter, or rather a shortage of voxels that contain a large amount of CSF. In other words, the number of voxels containing a certain percentage of gray matter increases with that percentage, or the derivative n′cg(xg) ≥ 0. A similar argument holds for the GM-WM boundary. If the white matter reaching into the gray matter is “thin” or “branching”, then voxels covering that area will contain a relatively large amount of gray matter, whereas there are no voxels that contain a large amount of white matter to counterbalance. Note that the uniform distribution is a special case of our model, if n′.(xg) = 0. To summarize, we propose a “tent”-shaped (partial volume) density that has three delta functions for pure CSF, pure gray matter and pure white matter, is non-decreasing between CSF and gray matter and non-increasing between gray matter and white matter, see Fig. 2.

Fig. 2. Schematic representation of the tent-shaped partial volume density (solid line). Included are a schematic representation of a U-shaped partial volume density (dashed line) and a uniform density (dotted line). The three arrows represent the pure tissue classes. This figure combines the CSF-gray matter mixture density ncg(xg), with xg running from 0 (pure CSF) to 1 (pure gray matter) and the gray matter–white matter mixture density ngw(xg), with xg running from 1 (pure gray matter) to 0 (pure white matter), the latter being reversed so that the x-axis is in concordance with image intensity.

470

R.M. Brouwer et al. / NeuroImage 49 (2010) 467–477

gray matter in voxels having intensity L. Then we normalize by the total number of voxels having intensity L. The expected fractions for CSF and white matter are computed likewise: Eðxc jLÞ =

1 hðLÞ

Z

1 0

 xc nc δð1 − xc Þρc ðLÞ

+ ncg ð1 − xc Þpcg ð1 − xc ; LÞÞdxc ; Z 1    1 Eðxg jLÞ = x n x Þp x ; LÞ hðLÞ 0 g cg g cg g    + ng δ 1 − xg Þρg ðLÞ + ngw xg Þpgw xg ; LÞÞdxg ; Z 1  1 Eðxw jLÞ = x n ð1 − xw Þpgw ð1 − xw ; LÞ hðLÞ 0 w gw Fig. 3. Schematic 2D representation of two cortical gyri. When the gyri are close together, a voxel overlapping the sulcus will sample from the gray matter of both gyri (square). Averaging over the cortex, or equivalently moving the grid over this sulcus, will provide more voxels with a relatively large amount of gray matter than voxels with a relatively large amount of CSF. The arrow represents the “space” direction when travelling from white matter (white), through gray matter (gray), into CSF (black).

2.3. Computation of expected tissue fractions From the mean intensities of the pure tissues (μc, μg and μw) and the variance (σ = σc = σg = σw) we can compute a probability density p.(xg, L) that gives the probability that a voxel containing a fraction of gray matter xg ∈ (0,1) has intensity L. Assume that a voxel contains a fraction xg of gray matter and a fraction (1 − xg) of CSF, with intensities lg and lc, respectively. Then the intensities are related by L = xglg + (1 − xg)lc. We compute the probability density of a voxel with a fraction xg of gray matter having an intensity L (see Appendix B): Z   pcg xg ; L : =

∞ −∞

  ρc ρg lg



L − xg lg 1 − xg

1 − xg

 dlg ;

ð2Þ

where ρg(.) is a Gaussian density with mean μg and variance σ (the density of pure gray matter), and ρc(.) is a Gaussian density with mean μc and variance σ (the density of pure CSF). Similarly, for voxels containing a mixture of gray and white matter Z   pgw xg ; L : =

  L − x l   ρw 1 − xg g g ρg lg dlg ; 1 − xg −∞ ∞

ð3Þ

where ρw(.) is a Gaussian density with mean μw and variance σ (the density of pure white matter). The total amount of voxels having intensity L, here denoted by h (L), can now be written as Z hðLÞ = 0

1

ðnc δðxg Þρc ðLÞ + ncg ðxg Þpcg ðxg ; LÞ

ð4Þ

+ ng δð1 − xg Þρg ðLÞ + ngw ðxg Þpgw ðxg ; LÞ+ nw δðxgÞρw ðLÞÞdxg : The pure gray and white matter mean intensities μg and μw will be estimated from the image, see Subsection 2.1. The variance σ, the CSF mean intensity μc and partial volume parameters nc, ng, nw, ncg(xg) and ngw(xg) will be estimated by fitting the right hand side of Eq. (4) to the intensity histogram of the image, see Subsection 2.4. Once we have estimated all parameters we are able to compute expected fractions E(xc|L), E(xg|L) and E(xw|L) for each tissue type given the intensity of the voxel. For gray matter, first we multiply the probability of a voxel having intensity L and gray matter content xg ∈ [0,1] by xg times the number of voxels that have gray matter fraction xg. Integrating over xg, we obtain the expected amount of

+ nw δð1 − xw Þρw ðLÞÞdxw : Note that, in a Bayesian framework, the expected fractions are the Bayesian estimators with the mean square error as the risk function. For a typical example of expected tissue fractions, see Fig. 4. Alternatively, interpolating linearly between the pure tissue type intensity means we get estimates x̂ c ðLÞ, x̂ g ðLÞ and x̂ w ðLÞ for the fractions of CSF, gray matter and white matter as a function of the intensity L, respectively. For example, for a voxel with intensity μ − L μ c b L b μ g, x̂ c ðLÞ = μ g − μ . These estimates can be seen as the g c maximum a priori (MAP) estimators for the special case that the partial volume density is uniform. Although the MAP estimator and the MSE estimator are similar in that case, they are not equal: the MSE estimate takes the pure tissue voxels into account, which results in higher tissue fractions for voxels with intensity close to the mean pure intensities. 2.4. Model fitting Once we have estimated the pure gray and white matter intensities, we estimate the pure CSF intensity, the variance of the pure tissue intensities and the parameters of the partial volume density function. This is done by fitting the discrete-valued intensity histogram H(L) obtained from the image, against the continuous theoretical intensity histogram (right hand side of Eq. (4)) with the simplex method: first we discretise the integral on the right hand side of Eq. (4) by approximating both ncg(xg) and ngw(xg) by a histogram, interpolating between the bins using Simpson's rule (Atkinson, 1989). The number of sampling points can be chosen arbitrarily, but in this paper was set to 10 for ncg(xg) and 5 for ngw(xg). Subsequently we readjust our estimates for μc and σ using the simplex method (Nelder and Mead, 1965), computing nc, ng, nw, and the parameters for n.(xg), following the method by Portugal et al. (1994), until the right hand side of Eq. (4) fits the histogram H(L) best. For an example of a fit see Fig. 1. 2.5. Validation To test the reliability and validity of the algorithm and its output we used a variety of test sets: 1. Examples of real MR human brain images for visual inspection: healthy adults, schizophrenia patients and children. The scans were acquired on Philips NT (adult scans)/Achieva (pediatric scans) scanners operating at 1.5 T. T1-weighted 3D fast field echo (NT) and spoiled-gradient echo scans (Achieva) with 160–180 1.2mm contiguous coronal slices (parameters for both scanners: TE = 4.6 ms, TR = 30 ms, flip angle = 30°, FOV = 256 mm/80%; inplane voxel sizes 1 × 1 mm2), and T2-weighted dual echo turbo spin echo (DE-TSE) scans with 120 1.6-mm contiguous coronal slices (TE1 = 14 ms, TE2 = 80 ms, TR = 6350 ms, flip angle = 90°, FOV = 256 mm/80%; in-plane voxel sizes1 × 1 mm2), of the whole

R.M. Brouwer et al. / NeuroImage 49 (2010) 467–477

471

Fig. 4. Axial slice of a typical partial volume segmentation; the 20-year old schizophrenia patient is part of set 1. From left to right: T1-weighed image, CSF fractions, GM fractions, WM fractions. Brighter intensities represent higher tissue fractions.

head were made. Preprocessing of the scans consisted of the following steps: the scans were automatically put into Talairach orientation (Talairach and Tournoux, 1988) (no scaling) and were corrected for inhomogeneities in the magnetic field using the N3 method (Sled et al., 1998). The T2-weighted scan was used to create an intracranial mask on the T1-weighted scan, as was described in Schnack et al. (2001). 2. Real MR human brain images: repeated scans. Five healthy adults were scanned twice within three months on a Philips 1.5 T scanner, with the same protocol as used for set 1 (NT version). 3. Real MR human brain images: repeated scans with different voxel sizes. Five healthy adults (different from set 2) were scanned on both a Philips 1.5 T scanner with the same protocol as used for set 1 (Achieva version) and a Philips Achieva 3 T scanner. The scan parameters for the 3 T turbo field echo T1weighted scan were: 180 0.8-mm contiguous sagittal slices (TE = 4.6 ms, TR = 10 ms, flip angle 90°, FOV= 240 mm/100%, in-plane voxel sizes 0.75 × 0.75 mm2). 4. a) Twenty high-resolution phantom brains which were created with the MRI simulator from Brainweb (http://www.bic.mni. mcgill.ca/brainweb/; Aubert-Broche et al., 2006a, 2006b; Collins et al., 1998; Cocosco et al., 1997; Kwan et al., 1996). Of these simulated T1-weighted scans (isotropic voxels of length 0.5 mm), representing 20 different subjects, the true tissue content is available for all voxels, both in binary and in partial volume form. The images were created with a Gaussian noise in both the real and imaginary image (independently) with a standard deviation of 1% of the white matter reference tissue proton density parameter. Considering the voxelpffiffiffisize, 1% noise in our simulated images is comparable to 1k × 8≈3k noise in an image having isotropic voxels of length 1 mm. This leads to a standard deviation of the pure tissue types of approximately 0.04 times the mean white matter intensity, comparable to the estimates in true images in set

Fig. 5. Two parts of slices of a simulated “oranges” image: left, the normal part, containing oranges with white matter cores, gray matter peels, floating in a sea of CSF; right, the inverted part, containing CSF cores, gray matter peels, surrounded by white matter. The peel (“cortex”) thickness is 1.8 mm. The voxel size is 1 × 1 × 1 mm3.

1. Tissue parameters were taken as in Aubert-Broche et al. (2006a). b) In addition, we use the “traditional” MNI fuzzy phantom brain, with 3% and 5% noise levels. 5. Simulated phantom images: “oranges”. To mimic the cortex we simulated crates of closely packed balls of white matter (the orange), with a peel of gray matter layer (representing the cortex) in a sea of CSF (space between the oranges), see Fig. 5, left. This image acts as a crude approximation of the gyri of the brain. The inverse image, of closely packed balls of CSF, with surrounding layers of gray matter, in a white matter environment serves as an approximation of the deep sulci, see Fig. 5, right. Gaussian noise was added. The combination of the two produces an intensity histogram that resembles that of a real T1-weighted brain scan, see Fig. 6. The relevant parameters to simulate the “true” oranges are: core radius (rcore), peel thickness (dgm), and mean distance between the oranges (“sulcal width”, a). To simulate the MRI-like images of the oranges, we need to define mean intensities of the three tissue types (μc, μg, μw), a standard deviation (σ) representing natural intensity variation and noise and voxel size. Two tests have been carried out. First, 100 parameter sets were simulated, from which “truth” images and MRI-like images were created. The parameter mean values and standard deviations were chosen in order to cover realistic situations: rcore = 4.00 ± 0.40 mm and 1.30 ± 0.20 mm (oranges and inverted oranges respectively), d gm = 2.00 ± 0.20 mm, a = 0.02 ± 0.01 mm and − 0.10 ± 0.50 mm, μ c = 100 (a.u.), μ g = 300 ± 20, μ w = 400 ± 20, σ = 15.00 ± 1.34 and voxel size 1 × 1 × 1 mm3. For the second test we fixed all parameter values except for the peel thickness dgm, representing cortical thickness, which was varied in 20 steps between 1.50 and 3.00 mm.

Fig. 6. Intensity histogram of a simulated “orange” image. Note the resemblance to the real MRI's histogram (Fig. 1).

472

R.M. Brouwer et al. / NeuroImage 49 (2010) 467–477

These five test sets were chosen because together they cover the complete set of tests we wished to put our algorithm to. Apart from checking the segments visually and checking if the tissue volumes assume reasonable values, not much can be said about the quality from set 1. We were in the lucky situation of having some sets (2 and 3) of repeated scans at our disposal (set 2 was created as part of a multicenter calibration study (Schnack et al., 2004) and set 3 was created as part of a calibration for longitudinal studies). These two sets could be used to test the reliability of the algorithm, since analysis of different scans from the same subject should lead to the same tissue volumes. For both sets the reliability was assessed by the intra-class correlation coefficient (ICC), measuring the absolute comparability between the volumes (fixed effects for scanners and measuring absolute agreement, McGraw and Wong (1996), case 3A). The intra-class correlation coefficient takes values between −1 and 1. The value 1 represent total agreement, the lower bound represents total disagreement. Note that set 3 allows us to test the effect of changing to a different scanner with a different field strength combined with a different voxel size. In particular, the 3 T voxels are a factor 2.67 smaller in volume than the 1.5 T voxels. The partial volume effect is expected to be very different in the two scans and our algorithm should be able to deal with this. Apart from checking the comparability of tissue volumes, we could have a look at the resulting pure and partial volume densities for this set. To investigate whether the proposed nonuniform form of the partial volume density performed better than a uniform partial volume density, we re-fitted the intensity histograms of the scans in sets 2,  3 and 4a, using a uniform partial volume density, by setting n′: xg = 0. To make this comparison as fair as possible, we tried several combinations (fitting the pure tissue types versus fixing some to the estimated value of our default algorithm, estimating one σ for all tissue types versus allowing σc ≠ σg ≠ σw) for the uniform fit. To test the validity of the tissue volumes we needed images for which the ground truth is available. MNI simulated images (set 4) are frequently used for this purpose (Bach Cuadra et al., 2005; Kovacevic et al., 2002; Ruan et al., 2000; Shattuck et al., 2001; Cocosco et al., 2003; Tohka et al., 2004) because they are realistic images created by an MRI simulator on human brain ground truths. To determine the accuracy of the segments themselves rather than their volumes, we computed several error measures. We computed the mean square error of the estimated fractions and the known true volume fractions which are available from Brainweb (http://www.bic.mni.mcgill.ca/ brainweb/). For t in CSF, GM, WM RMSEt =

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 1 X xtest − xttrue ; jVj V

ð5Þ

where xtest represents the estimated fraction of tissue t in a voxel, xttrue represents the true fraction of tissue t in a voxel, and Ω is the intracranial mask. We also compute the similar mean absolute error: MAE =

X 1 X jxtest − xttrue j: V ta CSF; GM; WM f g jVj

ð6Þ

In some cases, when a tissue mask is needed rather than volumetric measurements, a binary segmentation is more useful than a partial volume segmentation. We transformed the three partial volume segments from set 4 into binary segments by selecting all voxels with an expected fraction of 0.5 and higher, which is the most likely label. We compared these binary segments for each of the pure tissue types with the binary ground truth using the kappa, or Dice overlap metric (Dice, 1945): κ =2

jSest \ Struth j ; ð jSest j + jStruth j Þ

ð7Þ

where Sest denotes our binary segment and Struth is the binary “truth” segment. The κ value ranges between 0 and 1, where 1 means perfect overlap. In addition, we also created the simulated “oranges” of set 5. Although lacking the realistic appearance of MR brain images, they do allow us to model all tissue and structure parameters freely. Certain parameter values lead to intensity histograms that resembled those of our own T1-weighted images (sets 1–3). Our segmentation program was applied to the 100 images in set 5, and the experimental values of μc, μg and μw as well as the CSF, GM and WM volumes were compared to the simulated true values. This tested whether the determination of the mean pure tissue intensities of gray and white matter, as described in Subsection 2.1, provided correct results. Another advantage of these simulated “oranges” is the possibility to investigate the influence of one parameter on the resulting volumes. This allows us to test for possible bias, which is an important issue in patient studies, where possible specific differences in brain morphology should not be under- or overestimated, or influence other tissue volume estimates. In MRI studies one often tries to detect differences in gray matter volume, which are thought to be due to cortical thinning. A bias, in the form of an over- or underestimation of these effects is undesirable. We chose to investigate a possible bias in volumetric results due to differences in cortical thicknesses by creating the second test set of simulated oranges. The choice for taking the expected fractions E(x.|L) instead of x̂ :ðLÞ, which linearly interpolate between the pure tissue intensities, depends on the desired purpose of the algorithm. Theoretically, the expected fractions should result in better volumetric estimates. To test this, we computed ICC values on sets 2, 3 and 4a for volumes computed with expected partial volume fractions versus linearly interpolated partial volume estimates. For set 4a, where the ground truth is known, we also computed RMSE and MAE errors. 3. Results 3.1. Volumetric reliability (sets 2, 3, 4a, 5) Examples of segmentations for visual inspection can be found in Fig. 4 and Supplementary data. ICC values were computed of measured tissue volumes from first and second scans (set 2), 1.5 T and 3 T scans (set 3) and of measured and ground truth volumes (sets 4a, 5). See Table 1 for the results. For total CSF, GM and WM the ICC values were above 0.93 for all four sets, total CSF volume in the MNI simulated scans being the exception (ICC = 0.84). This “worse” performance is largely due to the fact that the simulated scans have a separate vessel class, which is estimated as CSF by our algorithm. For comparison, the ICC value was 0.97, when we compared our estimated CSF volumes against a “fluid” class, consisting of the true CSF plus vessel volumes. Mean volumetric differences for the scan–rescan data (set 2) were −1.1 ml (−0.2%, s.d. 6.7 ml) for gray matter and −2.2 ml (−0.4%, s.d. 8.4 ml) for white matter. For the 1.5 T–3 T data (set 3) the mean difference was −12.9 ml (−1.7%, 16.1 ml) for gray matter and − 4.5 ml (−1.0%, 9.6 ml) for white matter. See Supplementary data for an example of a segmentation for one subject scanned on both the 1.5 T and the 3 T scanner. 3.2. Accuracy of the parameter estimates (sets 4a, 5) For both the MNI simulated brain images and the simulated oranges, the pure tissue mean intensities were estimated. Differences between the true and estimated values, relative to the pure tissue intensity contrasts, are reported in Table 2. The mean pure CSF and white intensity estimates deviated from the true value with

R.M. Brouwer et al. / NeuroImage 49 (2010) 467–477

473

Table 1 ICC values for CSF, GM, WM volumes on sets 2–5, for several versions of the algorithm: ICC values for the proposed algorithm with non-uniform partial volume density, for the implementation of a uniform partial volume density and for computation of the tissue fractions using linear interpolation instead of expectations, Subsections 3.1, 3.4.2 and 3.6, respectively. Methods

Set 2 Repeated

Set 3 1.5 T–3 T

Set 4a MNI phantom brains

Set 5 Oranges

1.00 0.97 1.00 –

0.98 0.93 1.00 –

0.84 0.97 0.98 0.97

0.99 0.98 0.98 –

0.94 0.85 0.96

0.99 0.80 0.004

0.91 0.91 0.98

– – –

1.00 0.98 1.00

0.98 0.87 0.97

0.63 0.91 0.92

– – –

Proposed CSF volume ICC GM volume ICC WM volume ICC “fluid” volume ICC Uniform density CSF volume ICC GM volume ICC WM volume ICC Linear interpolation CSF volume ICC GM volume ICC WM volume ICC

For the MNI phantom brains (set 4a) we included an ICC value for our CSF versus a “fluid” (CSF + vessel) volume.

no more than 2% relative to their respective contrasts with the gray matter intensity. The mean pure gray intensity estimate deviated from the true value with less than 0.3%, relative to both contrasts. For a comparison with other methods, we also computed the class mean error (CME) as X

jμ iðestÞ −μ iðtrueÞ j:

iafc;g;wg

where μ iðestÞ and μ iðtrueÞ are the estimated and true parameter values for tissue i. The mean CME for the simulated images in set 4a was 0.45 (s.d. 0.22). 3.3. Overlap measures (set 4) For all 20 high-resolution images in set 4a we computed the root mean square error (RMSE, Eq. (5)) mean absolute error (MAE, Eq. (6)) of the estimated and true tissue fractions. See Supplementary data for an example of the segmentation for one high-resolution phantom. Vessels were again included in CSF, since the methods that we wish to compare our algorithm with were applied to simulated brains that did not include a vessel class. Averaging over all 20 images, we obtained RMSEs of 0.112 for CSF, 0.105 for gray matter and 0.089 for white matter. The MAE was 0.128. Further, for all images in set 4a we computed the overlap between the binary segments created from our partial volume segments and the ground truth as the kappa or Dice coefficient (Eq. (7)). Mean Dice coefficients were above 0.94 for all tissue types. See Table 3. For the traditional MNI phantom brain of 1 mm resolution, RMSE and MAE errors, and Dice coefficients are reported in Table 4. Compared with the high-resolution phantom brains, error measures were somewhat smaller for the image with 3% noise and somewhat larger for the image with 5% noise.

Table 2 Differences between true and estimated pure tissue intensity means (Δμc, Δμg and Δμw), relative to the true pure intensity contrasts.

Δμc/(μg − μc) Δμg/(μg − μc) Δμg/(μw − μg) Δμw/(μw − μg)

Set 4a MNI phantom brains

Set 5 Oranges

− 0.009 − 0.001 − 0.002 − 0.018

0.023 − 0.003 − 0.002 − 0.012

(0.01) (0.01) (0.02) (0.01)

(0.01) (0.01) (0.01) (0.01)

Means (s.d.'s) are given over all twenty high-resolution simulated brain images in set 4a and all 100 simulated images in set 5.

3.4. Partial volume density (sets 2, 3) 3.4.1. Effects of voxel size We compared the partial volume densities of the 1.5 T and 3 T scans (set 3), the latter having smaller voxels (1.2 mm3 and 0.45 mm3, respectively). On average, the percentage of voxels classified as partial volume voxels dropped from 75.1% to 64.4% with smaller voxel size. Particularly, the fraction of voxels that were attributed to the pure matter tissue classes increased on average by 3.4 [from 2.9 to 6.3] percentage points for CSF, 8.6 [from 0.1 to 8.7] percentage points for GM and decreased by 0.3 [from 21.9 to 21.6] percentage point for WM, for the smaller voxel size. See Fig. 7 for an example of the two partial volume densities of one subject and Supplementary data for an example of a segmentation. 3.4.2. “Tent”-shaped versus uniform partial volume density (sets 2, 3, 4a) We fitted a uniform partial volume density to the images in sets 2, 3 and 4a with different combinations of fixed and free parameters: 1) fitting μc, μg and μw, and σc ≠ σg ≠ σw, 2) fitting μc, μw and σc ≠ σg ≠ σw, while keeping μg equal to the values found with our algorithm, 3) fitting μc and σc ≠ σg ≠ σw, while keeping μg and μw equal to the values found with our algorithm, 4) fitting μg, μw and σc = σg = σw, and 5) fitting μc, μg, μw and σc ≠ σg = σw. Of the possible combinations for fitting a uniform partial volume density, fitting all pure tissue types and only one σ (option 4) performed best based on the average ICC value, with ICC values 0.94/0.99/ 0.91 for CSF, 0.85/0.80/0.91 for GM and 0.96/0.004/0.98 for WM, on set 2, set 3 and set 4a respectively. These values are considerably lower than the ICC values found with our non-uniform partial volume density, for the real data sets (2 and 3) and comparable for set 4a, c.f. Table 1. The very low value of 0.004 for WM on set 3 was due to one subject for which the uniform partial volume distribution fitted particularly badly. To assess the effect of the partial volume distribution on the partial volume estimations, we computed the MAE and RMSE for the different tissue types for the best performing fit of the uniform partial volume density on the simulated brains in set 4a. Mean MAE was 0.124, range [0.100– 0.153]. Mean RMSE for white matter was 0.090 [0.080–0.109], for gray matter 0.106 [0.093–0.127] and for CSF 0.113 [0.103–0.140]. These values are almost equal to those obtained with our tentshaped partial volume density, see Table 3. The good performance of set 4a compared to the other sets may be due to the fact that the partial volume density of the simulated brains is almost uniform. Indeed, the partial volume density which we estimated with our tent-shaped density, resembles a uniform distribution.

474

R.M. Brouwer et al. / NeuroImage 49 (2010) 467–477

Table 3 Error measures for partial volume estimated fractions compared with true fractions: root mean square error (RMSE), Dice metric and mean absolute error (MAE). Methods

CSF (RMSE)

GM (RMSE)

WM (RMSE)

IC (MAE)

Proposed Uniform Linear

0.112 [0.103–0.140] 0.113 [0.103–0.140] 0.119 [0.110–0.146]

0.105 [0.093–0.110] 0.106 [0.093–0.127] 0.124 [0.118–0.135]

0.089 [0.080–0.108] 0.090 [0.080–0.109] 0.107 [0.103–0.116]

0.128 [0.105–0.149] 0.124 [0.100–0.153] 0.185 [0.178–0.197]

CSF (Dice)

GM (Dice)

WM (Dice)

0.935 [0.902–0.947]

0.968 [0.962–0.977]

0.964 [0.957–0.971]

Proposed

Means [range] over the twenty high-resolution simulated brain images in set 4a for the proposed algorithm, for the implementation of a uniform partial volume density and for computation of the tissue fractions using linear interpolation instead of expectations, see Subsections 3.3, 3.4.2 and 3.6, respectively.

3.5. The influence of cortical thickness on volume estimates (set 5) Fig. 8 shows the effect of cortical thickness on the volume estimates. The true and experimental fractions of CSF, GM and WM are shown, together with the “dynamical volumetric sensitivity”, δVgm(exp)/δVgm(truth), which gives the ratio of the effects of changing dgm on the experimental and true GM volumes, respectively, and which should lie around 1 for all thicknesses for an unbiased segmentation method. The dashed vertical lines define the window of realistic simulations, i.e., CSF volume fractions N3.3% (Gur et al., 1999) and dgm N 1.73 mm, the minimal cortical thickness that ensures that CSF and WM are not contained in the same voxel, with a voxel size of 1×1×1 mm3. 3.6. Expectation versus linear interpolation (sets 2, 3, 4a) Taking the linearly interpolated fraction as an estimate for tissue content resulted in approximately equal ICC values on set 2, and in lower ICC values for all volumes in sets 3 and 4a. In particular, ICC values for total gray matter volume were 0.98 for set 2, 0.87 for set 3 and 0.91 for set 4a, c.f. Table 1. In addition, we studied the difference between taking the expected fraction and linearly interpolating between the pure tissue intensities by computing MAE and RMSE error measures of interpolated fractions on set 4a, where the ground truth is known. Compared to taking the expected fractions, all error measures increased when taking the linearly interpolated fractions. Mean MAE was 0.185 with a range of [0.178–0.197], mean RMSE for white matter was 0.107 [0.103–0.116], mean RMSE for gray matter was 0.124 [0.118–0.135] and mean RMSE for CSF was 0.119 [0.110–0.146], c.f. Table 3.

pure gray and white matter tissue intensities were estimated accurately from the images. We have shown that variation in cortical thickness does not influence the accuracy of our algorithm, a property that is important when comparing groups of patients and healthy subjects. From a test on a set of subjects that were scanned twice on the same scanner, we found that intra-class correlation coefficient (ICC) values were above 0.97 for scan–rescan volumetric measures, with mean volumetric differences of −1.1 ml for gray matter volume and −2.2 ml for white matter volume. Other studies provided ICC values for scan–rescan volumetric measures: Chard et al. (2002) supplied ICC values of 0.95 for gray matter and 0.96 for white matter. ICC values reported in Agartz et al. (2001) were 0.98 for CSF, 0.99 for gray matter and 0.98 for white matter, averaged over the left and right hemispheres. Schnack et al. (2004) reported an ICC value of 0.91 for gray matter and 1.00 for white matter. Apart from the gray matter ICC value in Agartz et al. (2001), our ICC values were higher compared to the previous studies providing scan–rescan ICCs. ICC values were above 0.93 for repeated scans on the 1.5 T and 3 T scanners. These ICC values show that our algorithm does not suffer from the differences that arise from the difference in field strength, such as differences in

4. Discussion We presented an algorithm providing a partial volume segmentation of the brain into gray matter, white matter and cerebrospinal fluid (CSF) from T1-weighted images. The algorithm incorporates a non-uniform partial volume density that takes the curved nature of the cortex into account. This choice of partial volume density was shown to yield more reliable scan–rescan volumes than a uniform partial volume density, and allows for combining cross-sectional data using scans with different voxel sizes and different field strengths. The

Table 4 Error measures for partial volume estimated fractions compared with true fractions: root mean square error (RMSE), Dice metric and mean absolute error (MAE), for the traditional phantom brains with noise levels 3% and 5%. Noise %

CSF (RMSE)

GM (RMSE)

WM (RMSE)

IC (MAE)

3 5

0.052 0.068

0.093 0.143

0.077 0.127

0.127 0.187

Noise %

CSF (Dice)

GM (Dice)

WM (Dice)

3 5

0.954 0.940

0.955 0.930

0.964 0.939

Fig. 7. Distributions of voxels for one of the five subjects scanned on two scanners: a 1.5 T scanner (circles) with voxel size 1 × 1 × 1.2 mm3, and a 3 T scanner (squares) with voxel size 0.75 × 0.75 × 0.8 mm3. In order to compare the density functions n.(x) they have been integrated to convert them to numbers of voxels per bin (three pure bins, and two times five partial volume bins of width 0.2); then these numbers have been multiplied by the voxel volumes, 1.2 mm3 and 0.45 mm3, respectively, to represent the volume occupied by voxels of a certain constitution. The high-resolution 3 T image contains more pure voxels of all three tissue types, but the effect is largest for the pure gray matter.

R.M. Brouwer et al. / NeuroImage 49 (2010) 467–477

Fig. 8. Effect of “cortical thickness” on estimated volumes. The figure shows true (circles) and estimated (stars) CSF, GM, and WM volume fractions of the intracranial volume as a function of GM peel thickness (dgm), for a series of simulated orange images. Also shown is the dynamical GM volume sensitivity, i.e. the ratio of the experimental and true GM volume changes when peel thickness is changed (diamonds). The dashed lines mark the area of simulations with realistic parameters: CSF fractions N0.033, GM fractions b 0.62, peel (“cortical”) thickness N1.73 mm.

the inhomogeneity field or noise levels, and that it can handle the different shapes of partial volume densities that arise from different voxel sizes. Therefore, the algorithm seems useful for combining 1.5 T and 3 T data in cross-sectional studies. As expected, the estimation of the fraction of pure tissue voxels is higher in scans with a smaller voxel size. Moreover, the estimated partial volume densities for those scans have a less pronounced peak compared to the scans with larger voxel size. The comparison between the 1.5 T and 3 T data is not solely testing the effect of voxel size, but also of different field strengths. An extra comparison on a test set of repeated scans from 8 subjects on the same 1.5 T scanner (data not shown), comparing scans with voxel sizes 1.2 mm3 and 1.5 mm3 resulted in ICC values above 0.97 for all three tissue types. The mean pure tissue intensities were estimated accurately, as was shown from simulated (brain) images, where the true values are known. Our partial volume density follows a “tent”-shaped, or increasing/ decreasing pattern, whereas a (double) U-shape has been suggested in the literature (González Ballester et al., 2002; Röll et al., 1994; Bello et al., 1998). However, Röll et al. (1994) and Bello et al. (1998) have shown that this U-shape holds for convex volumes, whereas the brain is highly curved. The U-shape used in González Ballester et al. (2002) includes the pure tissue voxels, which we modelled separately. Our real MRI data has not given cause to adopt a double U-shaped density for the partial volume voxels. However, when considering histograms of subjects with very large ventricles or severe cortical atrophy, a Ushape might provide a better fit. The experiments on simulated images with different cortical thicknesses show that our method is not biased in the sense that the cortical thickness does not influence the accuracy of the volume estimates. This is extremely important when investigating psychiatric illnesses, where patients might suffer more from cortical thinning than their healthy control subjects. We have shown an absence of bias in the range of realistic cortical thicknesses, as determined from set 1, with minimal thickness 1.73 mm. While the largest part of the cortex has a thickness larger than this value, there might be cortical areas with a thickness close to this value, especially in diseased brains. From the simulation results (Fig. 8) it is seen that the absence of bias extends below and above the “realistic” thickness values.

475

Using the MNI simulated brain images, we can compare our method with others. We computed different error measures for this purpose. On the “traditional” fuzzy MNI brain images with 3% noise we computed root mean square errors of 0.077 for white matter and 0.093 for gray matter. Shattuck et al. (2001) provides root mean square errors of 0.088 for white matter, and 0.137 for gray matter. The root mean square errors of the TMCD (trimmed minimum covariance determinant) method in Tohka et al. (2004) (graphical display) seem comparable with ours. Our mean absolute error of 0.187 on the image with 5% noise is larger than the value of 0.155 mentioned in Tohka et al. (2004), which implicates that our method makes fewer large errors and more smaller errors. Because these error measures are based on one image only, we compared the methods also on the 20 improved high-resolution MNI images. Our mean absolute error of 0.128 on these images of approximately 3% noise is comparable with the TMCD method in Tohka et al. (2004) who report mean absolute errors of 0.084 for images with 1% noise and 0.155 for images with 5% noise. Our root mean square error for white matter is 0.089 and for gray matter 0.105. These errors are comparable for white matter and lower for gray matter compared to Shattuck et al. (2001). However, as the partial volume densities of the MNI images are almost uniform, and one of our main contributions is the implementation of a non-uniform partial volume density, our algorithm is not expected to perform better than the other algorithms. It must be noted that we used a newer version of the fuzzy brain phantom, with more tissue classes and a different voxel size than the above mentioned methods, for these last comparisons. Nevertheless, these results indicate that our algorithm would be competitive with these methods, on the MNI images. By assuming that all tissue types can be described by one Gaussian density we assume that tissues have the same intensity throughout the brain. In practice, tissue does vary throughout the brain as deep gray matter structures are different from cortical gray matter, see Fischl et al. (2004) for differences in the T1 and T2⁎ relaxation times throughout the brain. This may also cause the intensity variances to be different for different tissue types. Some methods therefore incorporate a separate variance for each tissue class (e.g. Ruan et al., 2000; Leemput et al., 2003; Tohka et al., 2004). Our method can be adjusted so that each pure tissue type has its own variance. However, we chose to consider tissue variability as a special case of partial volume, an assumption shared by e.g. Santago and Gage (1993), Shattuck et al. (2001) and Manjón et al. (2008). Furthermore, images usually suffer from intensity inhomogeneity, even after non-uniformity correction. A (partial) solution for all these issues might be to apply our method to different parts of the brain separately. However, one should take into account that one needs enough voxel pairs to estimate μg and μw accurately. Another issue, which we share with most other segmentation methods, is that we assume only two partial voluming types. For example, although our model assumes implicitly that there are no voxels containing white matter and CSF, around the ventricles they exist. Our method does not take this possibility into account and the voxels around the ventricles will be partly attributed to gray matter. However, these voxels take up less than 0.5% of the brain so the influence on total brain volume measures and the fit of the histogram will be neglectable. Further, since we have anatomical knowledge about their location, it will not be difficult to adjust this in a later version. Similarly, although the focus of the present work was to segment gray and white matter of the brain, it is possible to add a CSF/ background class. Another possible extension is applying this algorithm to T2-weighted, or other MRI images. As long as the contrast between the three main tissue types is sufficiently large, it should be possible to adjust the algorithm accordingly. Finally, as our algorithm was mainly designed to study a healthy population and psychiatric patients who suffer from subtle brain abnormalities, it does not include a pathological tissue class. The algorithm could be adjusted to include pathological tissue.

476

R.M. Brouwer et al. / NeuroImage 49 (2010) 467–477 ðxÞ − f ðx + 1Þ where we have set e = 2η + ΔL + f2σ in the second equality. In pure white matter f(x) = f(x + 1) = μw and we arrive at Eq. (1). For gray matter, pairs such that either f(x) or f(x + 1) is on the slope of the intensity pattern play a substantial role in the distribution of absolute intensity differences p(ΔL). Averaging Eq. (8) over space (i.e. integrating p(ΔL;f(x), f(x + 1)) over x where f(x) is as in Fig. 9) will determine the distribution p(ΔL). Subsequently, one should solve p′(ΔL) = 0 for Lt to determine the mode of the absolute intensity difference histogram. This can only be done numerically. It was found that the minimal mode is assumed for

  σ 2:7 μ − − μ σ− μ Lt − μ g μ g c w g  n ≈d gm 1 1 σ v − 2+6:5σ μ − μ + μ − μ + min μ g

c

w

g

g

1 1 − μc ; μw − μg

o ; ð9Þ

Fig. 9. The solid line f(x) represents the noiseless intensity profile when a voxel crosses the cortex from white matter through gray matter into CSF (x-axis, left to right), along the arrow in Fig. 3. Due to partial voluming, the intensity gradually decreases as the voxel samples more gray matter (left slope) or more CSF (right slope).

In conclusion, we presented a partial volume segmentation incorporating a non-uniform partial volume density that provides accurate volumetric measures and allows for comparisons between scans from different scanners with different voxel sizes. Appendix A. Determination of pure tissue intensities Let Lt = Ltest be either Lg, an estimate for the pure gray matter intensity μg, or Lw, an estimate for the pure white matter intensity μw. As was described in Subsection 2.1, we consider all intensity differences ΔL = L[i] − L[j] for neighboring, face-sharing voxel pairs (i,j) satisfying L[i] ≥ Lt and L[j] ≤ Lt. The intensity difference ΔL depends on the noise and the “true” voxel values, the latter determined by position of the voxel pair with respect to the cortex. Consider the one-dimensional intensity pattern when a voxel travels from white matter into gray matter into CSF in a noiseless image (along the arrow in Fig. 3). Fig. 9 shows a schematic representation of this intensity pattern. To keep notation simple, we set μg = 0. Because of the partial volume effect, the intensity of a voxel will first be constant (pure white), then gradually decreases as the amount of gray matter in the voxel increases, and will be constant again (pure gray) for some time, depending on the cortical thickness. This pattern is repeated (but possibly with a different slope) between pure gray matter and pure CSF. Let f(x) denote the true voxel intensity, described by the graph in Fig. 9, where x is representing the space direction in voxel units. Note that if a voxel i has intensity value f(x), its right neighbor j has voxel value f(x + 1). If a pair contributes to an absolute intensity difference histogram p(ΔL) for some Lt, then L[i] = f(x) + ηi ≥ Lt and L[j] = f(x+ 1) + ηj ≤ Lt must hold, where η. ∼ N(0,σ). Hence, averaging over the noise

where dgm is the cortical thickness and v is voxel length. The constants in this equation were found from fits to the numerical solutions of p′(ΔL) = 0. Inserting an average cortical thickness of 2.5 mm, a voxel length of 1 mm and the μ and σ values found from the scans of set 1, the error in the estimation of μg is found to be (0.0148 ± 0.0051) (μw − μg) for the adult scans, and (0.0119± 0.0028) (μw −μg) for the pediatric scans, i.e., generally less than 2% of the gray–white contrast. Note that this one-dimensional formulation is a simplified treatment of the voxel pairs on the cortex, and that the real justification of our estimation of μg comes from the tests on the simulated MNI and “oranges” images (sets 4 and 5 in Validation). Appendix B. Derivation of Eq. (2) We compute the probability density pcg(xg, L) of a voxel with gray matter fraction xg having intensity L. We form the joint probability density ρcg(lc,lg) = ρc(lc)ρg(lg), which has to be integrated over all possible (lc,lg) that satisfy L = xglg + (1 − xg)lc. This restriction determines a line S in the (lc,lg)-plane, so we define the coordinate transformation (lc,lg)→(L,s), by

lc =

L − s · cos ðα Þ 1 − xg

lg = s · sinðα Þ;  R  R where tan(α) = (1 − xg) / xg, so that ρcg lc ; lg dlc dg = ρ ˜ cg ðL; sÞjJ j dLds. The determinant of the Jacobian of this transformation is

j

0

Alc B AL B jJ j = @ Al

g

AL

1 Alc As C C = Alg A As

jj

=

2

½ΔL+η+f ðx Þ − f ðx +1Þ η 1 − 2 − 2σ e 2σ 2 e dη 2 − f ðxÞ − ΔL 2πσ

Lt − f ðxÞ Lt

  ΔL + f ðxÞ− f ðx + 1Þ 2 e− 2σ

2πσ

Z

2Lt − f ðx + 1Þ − f ðxÞ + ΔL 2σ 2Lt − f ðx + 1Þ − f ðxÞ − ΔL 2σ

e

− e2

j

Z   pcg xg ; L := ρ˜ cg ðL; sÞjJ jds Z

Z pðΔL; f ðxÞ; f ðx + 1ÞÞ=

1 1 B 1 − xg −cosðα Þ C B C = sinðα Þ : @ A 1 − xg 0 sinðα Þ

The probability density pcg(xg, L) can now be calculated as

Z =

! L sinðα Þ − s · cosðα Þ ds 1 − xg 1 − xg ! L − xg lg dlg ; 1 − xg 1 − xg

ρg ðs · sinðα ÞÞρc

=

2

0

  ρ g lg ρc

de;

ð8Þ

where we substituted s sin(α) = lg in the last equality. A similar derivation can be made for pgw(xg, L).

R.M. Brouwer et al. / NeuroImage 49 (2010) 467–477

Appendix C. Implementation The determination of the pure gray and white intensity is done with a C-program using the MINC and volume-io utility libraries (http://en.wikibooks.org/wiki/MINC). The required intracranial masks for all images in our test sets were obtained from a T2-weighted scan of the same subject, as was done in Schnack et al. (2001). All images were corrected for field inhomogeneity with the method of Sled et al. (1998) (spline distance 200 mm, maximal number of iterations 50, with intracranial mask as reference region for the inhomogeneity correction). For the estimation of pure gray matter we take into account that cortical gray matter may have a different intensity than subcortical gray matter areas. Therefore, we only select those voxel pairs that are likely to be on the cortical gray matter interface by masking out all subcortical areas. This mask was created by warping the individual brain non-linearly to a model brain, using ANIMAL (Collins et al., 1995). On this model brain, we have determined a mask containing only the cortical lobes, which was subsequently transformed back to the individual brain. Further, we consider the shared face of the selected voxel pair and the 2 × 3 × 3 box of voxels that completely surrounds this face. We select the voxel pair only if this neighborhood box supports the idea that the interface continues in a neighboring voxel pair, i.e., if this box falls apart in two connected parts, one of which contains voxels having intensities higher than Lg, the other containing voxels having intensities lower than Lg. The fitting of intensity histograms was implemented as a C-program, using gsl-routines (GNU Scientific Library, http://www.gnu.org/software/gsl/) for the implementation of the simplex method. To assure that we get a nonnegative derivative of n.(xg), we used a package that finds minimal nonnegative solutions for matrix equations (Cantarella and Piatek, 2004; Portugal et al., 1994). The implementation provides a fully automatic segmentation, for a T1-weighted image with a given intracranial mask. Typical runtime (including preprocessing) was 50 minutes on a computer with Intel Xeon CPU 2.66 GHz. Appendix D. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.neuroimage.2009.07.041. References Agartz, I., Okuguwa, G., Nordström, M., Greitz, D., Magnotta, V., Sedvall, G., 2001. Reliability and reproducibility of brain tissue volumetry from segmented MR scans. Eur. Arch. Psychiatry Clin. Neurosci. 251, 255–261. Anbeek, P., Vinken, K.L., van Bochove, G.S., van Osch, M.J., van der Grond, J., 2005. Probabilistic segmentation of brain tissue in MR imaging. NeuroImage 27, 795–804. Ashburner, J., Friston, K., 1997. Multimodal image coregistration and partitioning — a unified framework. NeuroImage 6, 209–217. Atkinson, K.A., 1989. An Introduction to Numerical Analysis2nd ed. John Wiley & Sons. Aubert-Broche, B., Evans, A.C., Collins, L., 2006a. A new improved version of the realistic digital brain phantom. NeuroImage 32, 138–145. Aubert-Broche, B., Griffin, M., Pike, G.B., Evans, A.C., Collins, D.L., 2006b. Twenty new digital brain phantoms for creation of validation image data bases. IEEE Trans. Med. Imag. 25, 1410–1416. Bach Cuadra, M., Cammoun, L., Butz, T., Cuisenaire, O., Thiran, J.P., 2005. Comparison and validation of tissue modelization and statistical classification methods in T1weighted MR brain images. IEEE Trans. Med. Imag. 24, 1548–1565. Bello, F., Colchester, A.C.F., Röll, S.A., 1998. A geometry- and intensity-based partial volume correction for MRI volumetric studies. Comput. Med. Imaging Graph. 22, 123–132. Bricq, S., Collet, C., Armspach, J., 2008. Unifying framework for multimodal brain MRI segmentation based on hidden Markov chains. Med. Image Anal. 12, 639–652. Cantarella, J., Piatek, M., aug 2004. tsnnls — a sparse nonnegative least-squares solver. URL:http://www.cs.duq.edu/piatek/tsnnls/. Chard, D., Parker, G., Griffin, C., Thompson, A., Miller, D., 2002. The reproducibility and sensitivity of brain tissue volume measurements derived from an SPM-based segmentation methodology. J. Magn. Res. Imaging 15, 259–267. Choi, H.S., Haynor, D.R., Kim, Y., 1991. Partial volume tissue classification of multichannel magnetic resonance images — a mixel model. IEEE Trans. Med. Imag. 10, 395–407. Cocosco, C.A., Kollokian, V., Kwan, R.K.S., Evans, A.C., 1997. Brainweb: online interface to a 3D MRI simulated brain database. Proceedings of 3-rd International Conference on Functional Mapping of the Human Brain. Vol. InNeuroImage, Copenhagen, p. 5.

477

Cocosco, C.A., Zijdenbos, A.P., Evans, A.C., 2003. A fully automatic and robust brain MRI tissue classification method. Med. Image Anal. 7, 513–527. Collins, D.L., Holmes, C.J., Peters, T.M., Evans, A.C., 1995. Automatic 3-d model-based neuroanatomical segmentation. Hum. Brain Mapp. 3, 190–208. Collins, D.L., Zijdenbos, A.P., Kollokian, V., Sled, J.G., Kabani, N.J., Holmes, C.J., Evans, A.C., 1998. Design and construction of a realistic digital brain phantom. IEEE Trans. Med. Imag. 17, 368–463. Dice, L.R., 1945. Measures of the amount of ecologic association between species. Ecology 26, 297–302. Fischl, B., Salat, D.H., van der Kouwe, A.J., Makris, N., Ségonne, F., Quinn, B.T., Dale, A.M., 2004. Sequence-independent segmentation of magnetic resonance images. NeuroImage 23, S69–S84. González Ballester, M., Zisserman, A.P., Brady, M., 2002. Estimation of partial volume effect in MRI. Med. Image Anal. 6, 389–405. Grabowski, T.J., Frank, R.J., Szumski, N.R., Brown, C.K., Damasio, H., 2000. Validation of partial tissue segmentation of single-channel magnetic resonance images of the brain. NeuroImage 12, 640–656. Gudbjartsson, H., Patz, S., 1995. The Rician distribution of MRI data. Magn. Reson. Med. 34, 910–914. Gur, R., Turetsky, B., Matsui, M., Yan, M., Bilker, W., Hughett, P., Gur, R., 1999. Sex differences in brain gray and white matter in healthy young adults: correlations with cognitive performance. J. Neurosci. 19, 4065–4072. Kovacevic, N., Lobaugh, N.J., Bronskill, M.J., Levine, B., Feinstein, A., Black, S.E., 2002. A robust method for extraction and automatic segmentation of brain images. NeuroImage 17, 1087–1100. Kreyszig, E., 1993. Advanced engineering mathematics, 7th ed. John Wiley and Sons, pp. 286–287. Kwan, R.K.S., Evans, A.C., Pike, G.B., 1996. An extensible MRI simulator for postprocessing evaluation, Visualization in Biomedical Computing Edition. Vol. 1131 of Lecture Notes in Computer Science. Springer-Verlag, pp. 135–140. Laidlaw, D.H., Fleischer, K.W., Barr, A.H., 1998. Partial-volume Bayesian classification of material mixtures in MR volume data using voxel histograms. IEEE Trans. Med. Imag. 17, 74–86. Leemput, K., Maes, F., Vandermeulen, D., Suetens, P., 2003. A unifying framework for partial volume segmentation of brain MR images. IEEE Trans. Med. Imag. 22, 105–119. Li, X., Li, L., Lu, H., Liang, Z., 2005. Partial volume segmentation of brain magnetic resonance images based on maximum a posteriori probability. Med. Phys. 32, 2337–2345. Manjón, J.V., Tohka, J., García-Martí, G., Carbonell-Caballero, J., Lull, J.J., Martí-Bonmatí, L., Robles, M., 2008. Robust MRI brain tissue parameter estimation by multistage outlier rejection. Magn. Reson. Med. 59, 866–873. McGraw, K.O., Wong, S., 1996. Forming inferences about some intraclass correlation coefficients. Psycological Methods 1, 30–46. Nelder, J., Mead, R., 1965. A simplex method for function minimization. Comput. J. 7, 308–315. Niessen, W.J., 1997. Multiscale medical image analysis. Ph.D. thesis, University of Utrecht, the Netherlands. Portugal, L.F., Júdice, J.J., Vicente, L.N., 1994. A comparison of block pivoting and interior-point algorithms for linear least squares problems with nonnegative variables. Math. Comp. 63 (208), 625–643. Röll, A., Colchester, A.C.F., Summers, P.E., Griffin, L.D., 1994. Intensity-based object extraction from 3D medical images including a correction of partial volume errors. Proceedings of the 5th British Machine Vision Conference. BMVC Press, pp. 205–214. Ruan, S., Jaggi, C., Xue, J., Fadili, J., Bloyet, D., 2000. Brain tissue classification of magnetic resonance images using partial volume modeling. IEEE Trans. Med. Imag. 19, 1187–1479. Santago, P., Gage, H.D., 1993. Quantification of MR brain images by mixture density and partial volume modeling. IEEE Trans. Med. Imag. 12, 566–574. Schnack, H.G., Hulshoff Pol, H.E., Baaré, W.F.C., Staal, W.G., Viergever, M.A., Kahn, R.S., 2001. Automated separation of gray and white matter from MR images of the human brain. NeuroImage 13, 230–237. Schnack, H.G., van Haren, N.E., Hulshoff Pol, H.E., Picchioni, M., Weisbrod, M., Sauer, H., Huttunen, M., Murray, R., Kahn, R.S., 2004. Reliability of brain volumes from multicenter MRI acquisition: a calibration study. Hum. Brain Mapp. 22, 312–320. Schroeter, P., Vesin, J.M., Langenberger, T., Meuli, R., 1998. Robust parameter estimation of intensity distributions for brain magnetic resonance images. IEEE Trans. Med. Imag. 17, 172–186. Shattuck, D.W., Sandor-Leahy, S.R., Schaper, K.A., Rottenberg, D.A., Leahy, R.M., 2001. Magnetic resonance image tissue classification using a partial volume model. NeuroImage 13, 856–876. Sled, J.G., Zijdenbos, A.P., Evans, A.C., 1998. A nonparametric method for automatic correction of intensity nonuniformity in MRI data. IEEE Trans. Med. Imag. 17, 87–97. Song, T., Gasparovic, C., Andreasen, N., Bockholt, J., Jamshidi, M., Lee, R.R., Huang, M., 2006. A hybrid tissue segmentation approach for brain MR images. Med. Biol. Eng. Comput. 44, 242–249. Talairach, J., Tournoux, P., 1988. Co-planar Stereotaxic Atlas of the Human Brain: 3Dimensional Proportional System: an Approach to Cerebral Imaging. Thieme, New York. Tohka, J., Krestyannikov, E., Dinov, I., Graham, A., Shattuck, D., Ruosalainen, U., Toga, A.W., 2007. Genetic algorithms for finite mixture model based tissue classification in neuroimaging. IEEE Trans. Med. Imaging 26, 696–711. Tohka, J., Zijdenbos, A., Evans, A., 2004. Fast and robust parameter estimation for statistical partial volume models in brain MRI. NeuroImage 23, 84–97. Zhang, Y., Brady, M., Smith, S., 2001. Segmentation of brain MR images through a hidden Markov random field model and the expectation–maximization algorithm. IEEE Trans. Med. Imag. 20, 45–57.