Intracranial volume normalization methods: Considerations when investigating gender differences in regional brain volume

Intracranial volume normalization methods: Considerations when investigating gender differences in regional brain volume

Psychiatry Research: Neuroimaging 231 (2015) 227–235 Contents lists available at ScienceDirect Psychiatry Research: Neuroimaging journal homepage: w...

1MB Sizes 0 Downloads 34 Views

Psychiatry Research: Neuroimaging 231 (2015) 227–235

Contents lists available at ScienceDirect

Psychiatry Research: Neuroimaging journal homepage: www.elsevier.com/locate/psychresns

Intracranial volume normalization methods: Considerations when investigating gender differences in regional brain volume Richard Nordenskjöld a,n, Filip Malmberg b, Elna-Marie Larsson a, Andrew Simmons c,d, Håkan Ahlström a, Lars Johansson a,e, Joel Kullberg a a

Department of Radiology, Uppsala University, MRT, Entrance 24, Uppsala University Hospital, SE-751 85 Uppsala, Sweden Centre for Image Analysis, Uppsala University, Uppsala, Sweden c King's College London, Institute of Psychiatry, London, UK d NIHR Biomedical Research Centre for Mental Health and NIHR Biomedical Research Unit for Dementia, London, UK e AstraZeneca, Mölndal, Sweden b

art ic l e i nf o

a b s t r a c t

Article history: Received 29 April 2014 Received in revised form 12 October 2014 Accepted 13 November 2014 Available online 5 December 2014

Intracranial volume (ICV) normalization of regional brain volumes (v) is common practice in volumetric studies of the aging brain. Multiple normalization methods exist and this study aimed to investigate when each method is appropriate to use in gender dimorphism studies and how differences in v are affected by the choice of method. A new method based on weighted ICV matching is also presented. Theoretical reasoning and simulated experiments were followed by an evaluation using real data comprising 400 subjects, all 75 years old, whose ICV was segmented with a gold standard method. The presented method allows good visualization of volume relation between gender groups. A different gender dimorphism in volume was found depending on the normalization method used for both simulated and real data. Method performance was also seen to depend on the slope (B) and intercept (m) from the linear relation between v and ICV (v ¼ B  ICV þ m) as well as gender distribution in the cohort. A suggested work-flow for selecting ICV normalization method when investigating gender related differences in regional brain volume is presented. & 2015 Published by Elsevier Ireland Ltd.

Keywords: Intracranial volume Normalization Dimorphism Sex difference Volume Human brain

1. Introduction Volumetric assessment of human regional brain volume is important for understanding diseases associated with brain morphology, and volumetry is increasingly used for diagnostic purposes, for example hippocampal volume in Alzheimer's disease. The assessment of gender specific differences in regional brain volume may further add to the understanding of a disease and be important for setting gender specific normal cut off ranges. In regional brain volumetric studies, intracranial volume (ICV) normalization is an important step (Barnes et al., 2010). It has been used to compensate for gender differences (Scahill, 2003; Whitwell et al., 2001) and inter-subject variations in head size (Free et al., 1995; Whitwell et al., 2001). ICV can also be used as a measure of premorbid brain size (Davis and Wright, 1977). Since ICV is typically larger in males than in females (Gur et al., 1999; Nordenskjöld et al., 2013), and many regional brain volumes are

n

Corresponding author. Tel.: þ 46 186110167. E-mail address: [email protected] (R. Nordenskjöld).

http://dx.doi.org/10.1016/j.pscychresns.2014.11.011 0925-4927/& 2015 Published by Elsevier Ireland Ltd.

associated with ICV, ICV normalization is needed before any intergender comparisons of these regional brain volumes are performed. There are multiple approaches for ICV normalization in regional brain volumetric studies investigating gender differences. The proportion method aims to express a regional brain volume as the proportion of the entire cranial cavity it occupies. The residual method aims to remove the variation in the regional brain volume associated with ICV (Jack et al., 1989). A third method uses ICV as a covariate in a linear regression model. Finally, the need for ICV compensation can be removed by ICV matching where only intergender pairs having similar ICV are used. In previous studies investigating gender associated differences in regional brain volume, different normalization methods have been used. In Sullivan et al. (2001) gender differences in corpus callosum (CC) volume were investigated using the proportion, residual, and covariate methods. Additional analysis was conducted on a subset of males and females where the subjects were matched on age, ICV mean, and ICV range between genders. Raw CC volumes were found to be larger in males. With the exception of one test using the proportion method, males were found to have larger CC than females. Ardekani et al. (2013) found females to have larger CC volume than males both when using the covariate method on the entire cohort, and

228

R. Nordenskjöld et al. / Psychiatry Research: Neuroimaging 231 (2015) 227–235

when using a subset of males and females matched by ICV and approximately by age. Gray matter (GM) and white matter (WM) differences between genders have been investigated by Gur et al. (1999). Using the covariate method as well as ICV matching, females were found to have larger GM volume than males, while males had larger WM volumes than females. In Greenberg et al. (2008) no difference in hippocampal volume was found between genders when using the proportion or covariate methods. Gender differences in GM and WM volumes (p o0:05) were consistent with the findings by Gur et al. (1999) when using either of the methods. Murphy et al. (1996) found that age related volume loss in hippocamus was greater in females than in males when using the proportion method. The proportion, covariate, and a residual method based on one group to normalize the entire cohort using total brain volume were compared by O'Brien et al. (2011). The theoretical background to each method was followed by a volumetric analysis where the results were found to differ between the methods. The authors concluded that visual representation of data should be used to aid selection of the best method. The residual method used in that study utilized a control group to normalize both the control and disease groups (Mathalon et al., 1993). They reasoned that if the covariate method is appropriate, it will give superior results compared to the residual method due to the entire cohort being used. In the case of gender comparisons, however, it is unclear which gender to select as the control. There are, to the authors' knowledge, no prior studies that give an in-depth explanation of the cause of differences between methods, what bias to expect, and when each method is appropriate for use in studies investigating sexual dimorphism using ICV normalized volumes. The aim of this study was threefold. Firstly, we aimed to present a theoretical background for different ICV normalization procedures and evaluate how the procedures handle gender differences in regional brain volume using simulated data. Secondly, we aimed to apply the normalization methods to real data comprising 400 elderly subjects, all of the same age. Thirdly, we aimed to determine if and when the different methods are appropriate for analyzing gender associated regional brain volume differences.

followed by a description and theory of different normalization methods, as well as the presentation of a new method for matching females and males in terms of ICV. After this, experiments using simulated and real data are presented. In the theoretical parts of this section, the data is assumed to be perfectly linear for simplified reasoning and illustration purposes. In real data this is naturally not the case. The section concludes by summarizing the ICV normalization methods presented. 2.1. Normalization pre-requirements A goal of ICV normalization is to enable comparison between subjects with differently sized cranial cavities, which can be achieved by expressing regional brain volume as a percentage of ICV or to determine regional brain volumes that are not associated with ICV (i.e. all variation in a regional brain volume associated with ICV has been removed). Most methods for ICV normalization rely on an assumption that the association between regional brain volume (v) and ICV is linear. A linear equation v ¼ B  ICV þ m;

ð1Þ

where B is the slope and m is the intercept, can be used to express this association. In real data, this linear association will almost certainly not be perfect. Therefore, an error term equal to the residual of the linear fit should be added to all equations below when considering real data. 2.2. Proportion method The aim of this method is to express a regional brain volume as the proportion  is calculated as of the cranial cavity it occupies. The normalized volume (v) v ¼ v=ICV

ð2Þ

If a linear relation between v and ICV is assumed, substituting v in Eq. (2) with Eq. (1) gives v ¼

ð1Þ ¼ B þ m=ICV ICV

ð3Þ

This shows that the normalized volume is still dependent on ICV. It also shows that the error will be introduced differently depending on the sign of m. When using this method one does therefore typically not correct or compensate for ICV, but rather extract a measure that can be used to investigate if regional volume ratios differ between compared groups. Note that this method normalizes each subjects regional volume without considering other subjects in the study cohort. 2.3. Residual method This method estimates the linear association between v and ICV, and transforms v so that this association is removed according to Eq. (4) (Jack et al., 1989):

2. Materials and methods

v ¼ v  BðICV  ICVÞ

In this section a comprehensive introduction to ICV normalization is given. First, the basic principles and assumptions of ICV normalization are presented. This is

where B is the slope of the linear association between v and ICV, and ICV is the mean of all ICV measures included in the calculation of B. Multiple subjects are needed to approximate the normalization parameters, and as the number of

ð4Þ

Fig. 1. Theoretical example of residual method normalization. Cyan and magenta dots represent different groups, each having their linear fit between regional brain volume (v) and intracranial volume (ICV) shown as a line having the same color. Cohort linear fit is represented by a black line. (a) Raw values. (b) Normalized using linear fit based on the entire cohort. (c) Normalized using gender specific linear fit. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)

R. Nordenskjöld et al. / Psychiatry Research: Neuroimaging 231 (2015) 227–235 subjects increases the linear fit can be more reliably determined. Substituting v in Eq. (4) with Eq. (1) gives v ¼ ð1Þ BðICV  ICVÞ ¼ B  ICV þ m ¼ v

ð5Þ

which is constant for all normalized volumes, and shows that the variation in  Unfortunately this v associated with ICV has been completely compensated for in v. is true only for the same cohort used when calculating the normalization variables B and ICV. If these variables are calculated using the entire cohort it is likely that ICV associated variations still exist in v male and v female when considered individually (Fig. 1b). ICV normalization with the residual method can also be performed for each gender group separately. This could however create a misleading comparison. Investigating the derived Eq. (5) shows that the normalized volumes are constant and dependent on v. If a positive correlation between ICV and v is assumed, it is not possible for females to have larger normalized volumes than males since males have larger ICV and v (Fig. 1c). 2.4. Covariate method A third method to compensate for ICV is to include an ICV estimation as a covariate in a linear regression model. The basic regression model has the form v ¼ β0 þ β1 Gender þ β2 ICV þ ϵ

ð6Þ

where Gender (females: 1, males: 0) and ICV are independent variables assumed to explain the dependent variable v, and ϵ is a residual error variable. The three β are found and represent the degree of variation in v associated with a variable in the model. For example, β2 indicates the degree of change in v as ICV changes. B0 is the offset (comparable to m in Eq. (1)). This is very similar to the calculation of B in Eq. (4), where the model only includes regional brain volume as dependent variable and ICV as independent variable. The difference is that instead of calculating the best fit line between v and ICV followed by a gender difference test, gender is already considered at the linear fit stage and β1 reveals the association between v and gender. All parameters included in the regression model compete in explaining the variance in v. If however the parameters included in the model are highly correlated, the results may be misleading since they both explain the same variation in the dependent variable, and there will be multiple solutions to Eq. (6) that are equally good. From Eq. (6) it is apparent that B from the linear relation between gender and ICV needs to be the same for both genders. There are equations other than Eq. (6) that can be used to calculate the regression coefficients β, for instance that was used in Ardekani et al. (2013). The basic principle is however the same as described above. 2.5. ICV matching The basic principle of ICV matching methods is to analyze gender differences in v while only including subjects so that ICV is similar between gender groups. This can be performed in a number of ways. Pairs of one male and one female with similar ICV can be created as described by Ardekani et al. (2013). The disadvantage of this method is the high sensitivity of single data point selection (as illustrated in Fig. 2a). A single measurement contains both systematic and random errors to some degree. Systematic errors are of little concern in these pairings as long as both groups considered for pairing are affected in the same way. The random error however makes individual measurements unreliable. Genders can also be matched as a group where the gender groups have similar ICV and ICV span (Sullivan et al., 2001). A third way is to calculate v for each gender separately in a small ICV interval. If both gender groups have at least one subject in the interval, then a pair is created using the gender's v. This method will henceforth be referred to as the Match method. Note that these methods are characterized by a suboptimal use of data. Subjects that are not included in the matching are not considered in the gender difference

229

calculations. There is also a possible concern with comparing data in a ICV limited range. In the ICV range containing measurements for both females and males, only females having the largest ICV and males having the smallest ICV will be included in the matching.

2.6. Gaussian weighted ICV pair matching To address the issue of selecting individual measurements as basis for statistical analyses as well as to avoid excluding a large part of the cohort, the idea of Gaussian weighted pairing is introduced in this paper. Even gold standard reference segmentations contain random errors, i.e. intra-rater evaluations do not have correlation coefficient r ¼ 1:0 (Pengas et al., 2009; Nordenskjöld et al., 2013). Random errors commonly follow a Gaussian distribution. To become less sensitive to both natural variations in v and measurement errors in ICV and v (Fig. 2b), the idea is to create a weighted average for a certain ICV by incorporating surrounding measurements in the pairing. For each subject (s), independent of gender, a new weighted v (v^ 1 ) is calculated. For the other gender, a matched v^ 2 is calculated for the ICV of s. v^ 1 is calculated by considering all other subjects having the same gender as s. Further details concerning this method are given in Appendix A. In contrast to some ICV pair matching methods, this method includes all subjects in the inter-gender ICV overlap with the value of each original v substituted with a Gaussian weighted average, and an interpolated match for the other gender.

2.7. Summary Table 1 provides a summary of each of the methods described in this section.

2.8. Experiments using simulated data To examine the results of ICV normalization under different scenarios, experiments with simulated data were carried out. Each experiment was performed by normalizing a regional brain volume having different associations with ICV depending on gender. The regional brain volume was normalized using the proportion method, gender separated residual method, cohort based residual method, covariate method, match method, and Gaussian weighted pairing. An interval of 1 ml was used for the match method, and when calculating the Gaussian weighted pairing σ ¼ 25 ml (Appendix A) was used. The simulated data used in the performed tests are given in Table 2. Measurements were simulated at each integer value of ICV. Additional experiments were performed to demonstrate pitfalls with the proportion and cohort based residual methods. To investigate the proportion methods sensitivity to different m as indicated by Eq. (3), a test was created with the following parameters:

   

Bmale, Bfemale ¼ 0.8. mmale, mfemale ¼ [  1 0 1]. ICVmale ¼1400–2000. ICVfemale ¼ 1000–1600.

Since the residual method is based on a linear fit using the entire cohort, a test with varying measurement density was created using the parameters

     

Bmale ¼ 0.08. Bfemale ¼0.12. mmale ¼ 70. mfemale ¼10. ICVmale ¼1400–2000. ICVfemale ¼ 1000–1600.

Fig. 2. Theoretical intracranial volume matched pairing example. (a) Example of the possible random difference when selecting individual points as basis for statistical analysis. Both pairs are valid selections but produce very different results. (b) Motivation using a weighted average to form pairs. Measurement errors exist in ICV, and v contains variations caused by both random measurement errors and natural variation in v.

230

R. Nordenskjöld et al. / Psychiatry Research: Neuroimaging 231 (2015) 227–235

Table 1  in relation to raw regional brain volumes Intracranial volume normalized values (v) (v), subject requirements, and if the relation between v and ICV needs to be linear.

Table 3 Results of simulated experiments for different normalization methods showing which gender has the larger regional brain volume in the simulated tests.

Method

v a

Subjects

Linear

Method

Test 1

Test 2

Test 3

Proportion Residualmale Residualfemale Residualcohort Covariate Matching Gaussian

B þ m=ICV v male v female v cohort Dependsb v A ICVinterval v weighted

Individual Cohort Cohort Cohort Cohort Pairs Cohort

No Yes Yes Yes Yes No Yes

Raw Expecteda Proportion Residualb Residualc Covariated Match Gaussian

Male No diff Female Male No diff No diff No diff No diff

Male Female Female Male Female Female Female Female

Male Female Female Male Female Female Female Female

a b

Not considering the residual variation from the linear fit. Depends on all variables included in the regression model.

a

If the effect of ICV on regional brain volume is removed. B and ICV determined for each gender separately. B and ICV determined using entire cohort. d Normally distributed noise (mean: 0, standard deviation: 0.1) added to make regression residual normally distributed. b c

Table 2 Simulated experiment setup where regional brain volume (v) is calculated for every integer ICV according to the linear equation v ¼ B  ICV þ m, with the parameters given. Test

vmale

vfemale

ICVmale

ICVfemale

1 2 3

0.10  ICV þ1 0.10  ICV þ1 0.10  ICV þ1

0.10  ICV þ 1 0.11  ICVþ 1 0.10  ICV þ 7

1400–2000 1400–2000 1400–2000

1000–1600 1000–1600 1000–1600

unpaired t-tests for raw volumes and volumes normalized with the proportion and residual methods; linear regression (dependent: v, independent: Gender [male: 0, female: 1], covariate: ICV) for the covariate method; and paired t-tests for the match and Gaussian weighted pairing methods. As in the experiments using simulated data, an interval of 1 ml was used for the match method and the calculation of the Gaussian weighted matching used σ ¼ 25 ml (Appendix A).

3. Results The ICV span for females was constant, but measurement density was altered over the span (one every 10th ICV, one every ICV, 10 every ICV). Measurement density for males was consistently one every ICV. Differences in regional brain volume between genders were determined with Wilcoxon rank sum tests for raw values, the proportion and residual methods; linear regression (dependent: v, independent: Gender [male: 0, female: 1], covariate: ICV) for the covariate method; and Wilcoxon signed rank test for the match and Gaussian weighted pairing methods. When evaluating the covariate method, normally distributed noise (mean: 0, standard deviation: 0.1) was added to v in order to make the regression residual normally distributed. The noise level was selected so as to be too small to have a significant effect on the results.

2.9. Experiments using real MRI data In this study, a subsample of 409 subjects from the PIVUS cohort (Lind et al., 2005) was investigated. The subjects included in the study were recruited based on their residency and represent an elderly epidemiological population all of age 75. All images were acquired with a 1.5 T clinical MRI scanner (Philips Healthcare, Best, The Netherlands). Axial PD-weighted images from a dual echo sequence (echo time: 20.7, repetition time: 3000 ms, flip angle: 901, resolution: 0.94  0.94  3.0 mm and matrix: 256  256  50) were used for ICV measurements, and sagittal T1weighted 3D gradient echo images (echo time: 4.0 ms, repetition time: 8.6 ms, flip angle: 81, resolution: 0.94  0.94  1.2 mm and matrix: 256  256  170) were used to measure regional brain volumes. The images were collected at the same session and both were oriented along the hypophysis-fastigium (HYFA) plane. This study was approved by the local ethics committee. To measure ICV, an interactive segmentation software called SmartPaint (Litjens et al., 2013; Malmberg et al., 2012) was used. In brief, using a circular brush tool, the user can create and refine a segmentation by sweeping with the mouse cursor in the object or background. Each voxel within the brush radius is labeled according to its distance from the brush center and the difference in intensity between the voxel and the brush center. The segmentation is updated at each mouse movement with immediate feedback. The segmentation was performed on PD-weighted images as these show good contrast between cortical bone and CSF. The segmentation protocol as well as the validation of the segmentations has been described previously (Nordenskjöld et al., 2013). To measure regional brain volumes, the fully automated Freesurfer pipeline (Dale et al., 1999; Fischl et al., 1999) version 5.1.0 was used. Volumes were extracted for left and right hippocampi, CC, GM, and WM. All default settings were used in the pipeline processing and all segmented volumes visually determined to contain a significant amount of error were excluded from further analysis. The regions were selected since they have displayed sexual dimorphism previously (Ardekani et al., 2013; Gur et al., 1999; Murphy et al., 1996; Sullivan et al., 2001). After excluding subjects missing either T1- or PD-weighted images as well as subjects with images that failed the Freesurfer pipeline, 400 subjects (209 males) were considered for further processing. All regional brain volumes were normalized using the same methods as in the experimental evaluation. Gender volumetric differences were examined using

3.1. Experiments using simulated data The results of the three tests performed using the simulated data are presented in Table 3. If the effect of ICV on regional brain volume is completely removed, results matching the “Expected” row should be obtained. There were however only 4 methods that obtained these expected results. Fig. 3 illustrates the proportion methods sensitivity to different m. From this figure it is clear that the results from the proportion method are affected by m. Fig. 4 shows results from the experiment where the residual methods' sensitivity to different ratios between subjects in each gender group was tested. It can be seen that the distribution is likely to affect the outcome. 3.2. Experiments using real data The regions used in the experiments as well as their linear association with ICV are presented in Table 4. As can be seen, none have m ¼0, meaning that the proportion method is likely to produce gender biased results. Results of gender related differences in regional brain volume when normalization is performed using different methods are presented in Table 5. Even in real data, the methods produce different results. Results from the Gaussian matching method can be seen in Fig. 5. This visualization can be of assistance in volume comparison studies. In this case females have larger GM volume than males in the ICV range containing measurements for both genders. This is the same data as in Table 5, but illustrates the additional visualization possibilities with this method.

4. Discussion Different procedures for normalizing regional brain volumes using ICV have been evaluated with regard to gender related volumetric differences. Both simulated and real data were used in the evaluation, which revealed clear differences between results obtained using the different methods for normalization.

R. Nordenskjöld et al. / Psychiatry Research: Neuroimaging 231 (2015) 227–235

231

Fig. 3. Normalization using the proportion method. (a) Raw values of three associations with slight variation in m. (b) Normalized values displaying differences in a gender comparison results. m o 0 results in males having larger normalized volumes, m 40 results in females having larger normalized volumes, and m¼ 0 results in no gender differences in normalized volume.

Fig. 4. Residual method with different gender ratios. (a) Raw data. (b) More males than females making the normalized mean volume larger in males than in females. (c) Equal gender ratio making the normalized mean volume the same between genders. (d) More females than males making the normalized mean volume larger in females than in males. Table 4 Linear association between a regional brain volume and ICV (v ¼ B  ICV þ m) divided by gender. Region

Bmale

mmale

Bfemale

mfemale

LH RH CC GM WM

1.355  10  3 1.367  10  3 1.330  10  3 0.297 0.366

1.396 1.414 0.720 127.2  72.78

1.450  10  3 1.466  10  3 1.006  10  3 0.321 0.340

1.397 1.427 1.266 103.6  28.68

[L/R]H, [Left/Right] hippocampus; CC, Corpus callosum; GM, Gray matter; WM, White matter. All linear associations were significant having p o 0:001.

The simulated experiments revealed differences in outcome when using the different methods. Perhaps most alarmingly is that many of them found a false difference in regional brain volume between the genders in Test 1 (Table 3), when none existed. Worth mentioning however is that there is a proportional difference in Test 1, which was found by the proportion method. The first issue to consider when selecting an ICV normalization procedure is what one wishes to examine. If the v=ICV ratio is of interest, then the proportion method is a good choice. This method is however not suitable when variations in regional brain volume associated with ICV need to be removed, except for one special case where the linear association has zero offset (Fig. 3). In this case the resulting normalized volume (Eq. (3)) can be simplified to B which is constant and uncorrelated with ICV. As supported by the theoretical

derivation of the normalized values for the proportion method, smaller ICV is undercompensated resulting in females having a falsely larger volume in Test 1. This method is however appropriate if the percentage of cranial cavity occupation is to be compared. In this case, females truly have a larger percentage of the cranium occupied by the tested region. When using the residual method, the normalized volume will be equal to v of the group used for normalization. When each gender is normalized separately, gender differences will likely be found in the normalized volume if ICV differs between groups. As males commonly have larger ICV than females, it comes as no surprise that males had larger normalized volumes in all simulated tests when using this method. Using the cohort based residual method, the normalized volume is completely uncorrelated with ICV when considering the same subjects used to calculate the linear slope B between regional brain volume and ICV. If however B differs between genders, using the cohort based residual method leaves a correlation between v and ICV when considering each gender separately (Fig. 4c). Misleading results were obtained for different gender distributions. The tests performed with the residual method with different ratios of measurements between genders gives a clear indication of the dangers with this method (Fig. 4). As the distribution varies, all three possible outcomes can be obtained. After normalization, the gender that contains more subjects will have a B that is more corrected towards zero than the other group, affecting the results for all cases where the genders have different B. There is yet another way to apply the residual method. This is to use one of the groups to normalize both groups, as performed in for

232

R. Nordenskjöld et al. / Psychiatry Research: Neuroimaging 231 (2015) 227–235

Table 5 Results for different brain regions using different ICV normalization procedures given as mean7 standard deviation. Results from the covariate method are given as the regression coefficient β, where a positive value indicates a larger regional brain volume in females. Region

Raw (ml)

Proportion (%)

Residuala (ml)

Residualb (ml)

Covariatec (β)

Match (ml)

Gaussian (ml)

LHF LHM LHp RHF RHM RHp CCF CCM CCp GMF GMM GMp WMF WMM WMp

3.377 0.42 3.497 0.41 o 0:05 3.43 7 0.39 3.53 7 0.41 o 0:05 2.64 70.36 2.78 7 0.41 o 0:001 541 743.5 5877 44.5 o 0:0001 4357 47.1 4947 54.4 o 0:0001

0.25 70.03 0.23 7 0.03 o 0:0001 0.25 70.03 0.23 7 0.03 o 0:0001 0.19 70.03 0.187 0.02 o 0:0001 39.7 71.96 37.9 71.96 o 0:0001 31.9 7 2.14 31.8 7 2.24 NS

3.377 0.39 3.497 0.38 o 0:05 3.43 70.36 3.53 7 0.38 o 0:05 2.64 7 0.35 2.78 7 0.38 o 0:001 541 725.9 5877 28.8 o 0:0001 435 729.0 4947 34.7 o 0:0001

3.487 0.39 3.40 7 0.39 o 0:05 3.53 7 0.36 3.447 0.38 o 0:05 2.737 0.35 2.69 7 0.38 NS 5687 26.2 562 728.9 o 0:05 4687 29.0 4647 34.9 NS

0.164

3.627 0.30 3.3370.41 o 0:05 3.607 0.34 3.377 0.36 o 0:05 2.747 0.33 2.59 7 0.35 NS 569 7 34.8 550 7 35.2 o 0:05 4617 37.7 454 7 42.0 NS

3.507 0.15 3.38 7 0.18 o 0:0001 3.557 0.16 3.417 0.17 o 0:0001 2.727 0.11 2.6770.17 o 0:0001 5727 35.6 5617 36.2 o 0:0001 466 7 34.4 462 7 42.3 o 0:0001

o 0:05 0.192 o 0:05 0.097 NS 0.109 o 0:05 0.059 NS

M, Male; F, Female; [L/R]H, [Left/Right] hippocampus; CC, Corpus callosum; GM, Gray matter; WM, White matter; NS, Non-significant (p 40:05). Values in bold indicate the largest mean value where a significant gender difference exists. a b c

B and ICVmean determined for each gender separately. B and ICVmean determined using entire cohort. Linear regression model: regional brain volume is the dependent variable, gender (male ¼ 0, female ¼1) is an independent variable, and ICV is a covariate.

(ml)

650

550 550 Male Female

GM

GM (ml)

650

500

450 1200

1400

1600

1800

ICV (ml)

1300

1400

1500

1600

ICV (ml)

Fig. 5. Gaussian weighted pairing result. (a) Gray matter (GM) volume plotted against ICV for all subjects. (b) Results of gender pair creation where females had larger GM volume than males in the gender overlapping ICV interval.

example Jack et al. (1989). There, a group of healthy subjects was used to normalize both healthy and diseased groups. The use of one group to normalize the cohort is however not appropriate when using gender as the grouping variable (Fig. 4 but even more extreme). In this case, the gender used to calculate the values for the normalization will have v completely uncorrelated with ICV, leaving the other gender still correlated. The sign of this remaining correlation then determines the outcome when examining gender differences. The covariate method is similar to the residual method. Instead of calculating the best fit line between v and ICV and then performing a statistical test to investigate gender differences, the covariate method finds the solution of one linear equation containing all variables to be tested. The regression model used included gender and ICV to be tested for association with gender. Solving the linear equation reveals how each of these variables explain the variance in v. If the variables included in the model explain the same variance, multiple solutions to the linear equation can be found, and the results become unreliable. As with the residual method, the ratio between subjects in each gender group is important as a linear fit to the data is calculated. ICV matching allows comparisons between genders while avoiding the need for ICV normalization. A huge downside however still remains in that females with larger than average ICVs which are compared to males having smaller than average ICVs, which may not be the desired outcome. Another downside is that the method only uses a subsample of the cohort in the analysis. There is also no consensus as to how the pairing should be performed. As multiple approaches are used, the results are hard to compare between studies.

A benefit when using a Gaussian weighted pair matching is that it includes more subjects in the analysis with all measurements within the ICV gender overlap included. In our studies using real data this resulted in measurements from 355 out of the 400 subjects being used. Another benefit is that one can visually inspect the relation between genders using for example Fig. 5. This can be useful to see how the regional volume difference between genders varies for different ICVs. The cohort based residual, covariate, and both pair based methods all gave the same results in the simulated experiments (Table 3). As the genders have the same number of subjects, contained no noise (or a small amount for the covariate method), this was to be expected. Systematic errors are characterized by consistent over- or underestimation of measurements, and random error by an increased variation while not affecting the mean value. Therefore, when relating the simulated experiments to real data one can expect the systematic error to have an effect on m and/or B and the random errors will decrease the linear correlation between regional brain volume and ICV. In Sanfilipo et al. (2004) an evaluation of how the proportion and residual methods handle these types of errors in v and ICV respectively was performed. When considering real data, the results in Table 5 show a pattern similar to the results when using simulated data (Table 3). There is also a similar pattern for all regions. By examining the slopes and intercepts in Table 4, it is apparent that the proportion method will produce bias since m a 0. For all regions except WM, females will have volumes that are more undercompensated for ICV than males, resulting in larger normalized volumes. This is

R. Nordenskjöld et al. / Psychiatry Research: Neuroimaging 231 (2015) 227–235

supported by the results in Table 5. For WM, m was less than 0 having the opposite effect on the results. As expected, the gender based residual method resulted in males having larger normalized volumes for all tested regions. This is because the mean volumes of all regions were larger in males than in females. Examining the linear parameters in Table 4, females have larger B and m than males when considering both hippocampal volumes. This implies that females cannot have smaller volumes than males once the effect of ICV has been compensated for. The results using real data support this, except when using the gender based residual method because of the raw mean values being larger in males. CC, GM and WM all have one of the linear parameters B and m larger in females and the other larger in males (Table 4). This makes the equal number of subjects in each gender group important when considering the cohort based residual (Fig. 4). In the cohort used, the group sizes are close to equal making this bias negligible. It is however possible that both matching methods will give misleading results. Having this scenario of linear parameters guarantees that the ICV–v lines of the gender groups will intersect at some ICV. This would imply that for some ICV span females have larger regional brain volume, and for another span males have larger regional brain volume. Since the match based methods only consider a limited span of the cohort, the results become dependent on where the lines intersect in relation to this span. In the case of CC and WM, females have larger m, implying that these regional brain volumes are larger than in males for low ICV values. For GM, females have larger regional brain volume than males for high ICV values. This might not always be relevant as the point of intersection may lie well outside any possible ICV span. It needs however to be verified in order to assure the relevance of any study results. Since the current study does not investigate if sexual dimorphism exists, but rather examines differences in results, no verification was performed. Returning to Table 4, no regions have the same B for both gender groups. This can be a warning sign when using the covariate method as it is assumed in the standard regression model that both genders have the same B. The more the B differs,

233

the more the uncertainty will be incorporated in the model's estimated parameters and the more the results become unreliable. Different contradicting results have been presented previously. In a study by Sullivan et al. (2001) men had larger CC area than females when using the residual and covariate methods, but there was no difference between genders with the proportion method. An important note however is that they used a crude approximation of ICV using a spheroid, which failed to have a significant correlation with female CC area. Ardekani et al. (2013) however found the opposite result. They used the covariate method and a ICV matching method, both showing females to have larger CC area than males. Their regression model included ICV2=3 as covariate in order to use the same unit as the CC area. Their male/female ratio was 0.60 which, as previously mentioned, could affect the results. Creating interpolated matched points using the Gaussian weighted pair matching produces double the number of measurements used in a statistical test. This will affect the p-values and should be corrected for. In our results presented in Table 5, the p-values were uncorrected. The significance was however so strong that a correction would not have resulted in any p-values reaching above the significance level of 0.05. A strength of this study is that the entire cohort has the same age, ruling out any age related variations affecting the ICV normalization. As discussed in Barnes et al. (2010) it is important to compensate for age as well as ICV in regional brain volume studies. Another strength is that the ICV was measured with a gold standard method, as well as the relatively large study cohort size having a similar amount of female and male subjects. All findings and conclusions in this study can be applied in studies investigating regional brain volume differences between groups other than gender. Take note however that other studies may have different pre-requirements. For instance, selecting the residual method using one group to normalize the cohort is motivated when studying differences between healthy and diagnosed subjects, but not when studying gender differences. The results of this study suggest that the linear associations between ICV and the regional brain volume to be normalized help

Estimate regional brain volume (v) and intracranial volume (ICV) Use any recommended method Is the aim to compare ratios?

Yes

Use Proportion method

No Visually assess Gaussian weighted matching

Estimate linear relation v = B*ICV + m

Yes No Are any methods recommended?

Bfemale

Bmale?*

Covariate method nfemale /nmale Residual method

Yes

No

No

Yes

Gender overlapping ICV range representative of whole ICV range?

No 1?*

Matching methods

Yes Proportion method mfemale

mmale

No 0?*

Yes

Fig. 6. Suggested work-flow for ICV normalization method selection when investigating gender differences. Blue boxes state a required estimation and yellow boxes present a Yes/No question. For a study, if the answer leads the flow through a red bubble, the method presented in that bubble is not recommended for that particular study. When reaching a green bubble a decision can be made according to the text in the bubble. The text either states a method to be used or that any method can be used if the flow has not passed the red bubble presenting the method. Here, the residual method is the version using the entire cohort for normalization and n is the number of subjects. The approximative equality is used as the results will be affected if not equal, but a small effect might be tolerated. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)

234

R. Nordenskjöld et al. / Psychiatry Research: Neuroimaging 231 (2015) 227–235

Fig. A1. Gaussian weighted pair calculation. (a) Example data with highlighted area and current measurement (p) being processed encircled. Black lines are the linear relation between v and ICV for each respective group. (b) All measurements are weighted (size of dot) using a Gaussian distribution and the difference in ICV between a weighted measurement and p's ICV. (c) Each measurement is mirrored to the opposite side of p's ICV along the best fit line for each group to avoid unbalanced calculations at the end of a genders ICV span. These mirrored measurements are weighted the same as their original counterpart as they are the same distance from p's ICV. (d) The mean regional brain volume (v) of all weighted measurements is the value used for pair matching.

determine the performance of the methods. It is therefore important to assess these properties before selecting a method for normalization. 4.1. Conclusions

calculated using Eq. (A.1), where p is a measurement containing regional brain volume v and ICV and s is a measurement being weighted. σ is the standard deviation of the Gaussian distribution, and was 25 ml in all experiments in this study: 2

In this paper we evaluated, using both simulated and real data, how different methods for ICV normalization affect study outcome when comparing regional brain volumes between genders. The conclusions drawn from the study result in a work-flow recommendation for the selection of ICV normalization method, presented in Fig. 6. All normalization methods included in this study can bias the results under different scenarios. It is therefore of great importance to analyze the relationship between regional brain volume and ICV of each respective gender group to better select the normalization method, as well as to understand results obtained.

weights ¼ e  ðICV s  ICV p Þ

=2σ 2

ðA:1Þ

^ was then determined The regional brain volume used in a pair (v) according to v^ s ¼

∑i A gender vi  weighti ∑i A gender weighti

ðA:2Þ

This was performed for each gender separately, leaving a pair at ICVs. A pair is created for each ICV that either gender has a measurement in, resulting in a pair for all measurements in the considered ICV range. σ can be adjusted according to the precision in the data used and the smoothing effect desired.

Acknowledgments References This study was financially supported by the Swedish Research Council (VR 2012-2330).

Appendix A. Gaussian weighted ICV pair matching A detailed illustration of the procedure used for Gaussian weighted pair extraction is given in Fig. A1. The weights are

Ardekani, B.A., Figarsky, K., Sidtis, J.J., 2013. Sexual dimorphism in the human corpus callosum: an MRI study using the oasis brain database. Cerebral Cortex 23 (10), 2514–2520. Barnes, J., Ridgway, G.R., Bartlett, J., Henley, S.M., Lehmann, M., Hobbs, N., Clarkson, M.J., MacManus, D.G., Ourselin, S., Fox, N.C., 2010. Head size, age and gender adjustment in MRI studies: a necessary nuisance?. NeuroImage 53 (4), 1244–1255. Dale, A.M., Fischl, B., Sereno, M.I., 1999. Cortical surface-based analysis. i. segmentation and surface reconstruction. NeuroImage 9 (2), 179–194.

R. Nordenskjöld et al. / Psychiatry Research: Neuroimaging 231 (2015) 227–235

Davis, P.J.M., Wright, E.A., 1977. A new method for measuring cranial cavity volume and its application to the assessment of cerebral atrophy at autopsy. Neuropathology and Applied Neurobiology 3 (5), 341–358. Fischl, B., Sereno, M.I., Dale, A.M., 1999. Cortical surface-based analysis. II: inflation, flattening, and a surface-based coordinate system. NeuroImage 9 (2), 195–207. Free, S.L., Bergin, P.S., Fish, D.R., Cook, M.J., Shorvon, S.D., Stevens, J.M., 1995. Methods for normalization of hippocampal volumes measured with MR. American Journal of Neuroradiology 16 (4), 637–643. Greenberg, D.L., Messer, D.F., Payne, M.E., MacFall, J.R., Provenzale, J.M., Steffens, D.C., Krishnan, R.R., 2008. Aging, gender, and the elderly adult brain: an examination of analytical strategies. Neurobiology of Aging 29 (2), 290–302. Gur, R.C., Turetsky, B.I., Matsui, M., Yan, M., Bilker, W., Hughett, P., Gur, R.E., 1999. Sex differences in brain gray and white matter in healthy young adults: correlations with cognitive performance. The Journal of Neuroscience 19 (10), 4065–4072. Jack, C., Twomey, C., Zinsmeister, A., Sharbrough, F., Petersen, R., Cascino, G., 1989. Anterior temporal lobes and hippocampal formations: normative volumetric measurements from MR images in young adults. Radiology 172 (2), 549–554. Lind, L., Fors, N., Hall, J., Marttala, K., Stenborg, A., 2005. A comparison of three different methods to evaluate endothelium-dependent vasodilation in the elderly: the prospective investigation of the vasculature in Uppsala seniors (PIVUS) study. Arteriosclerosis, thrombosis, and vascular biology 25 (November (11)), 2368–2375, PMID: 16141402. Litjens, G., Toth, R., van-de-Ven, W., Hoeks, C., Kerkstra, S., van-Ginneken, B., Vincent, G., Guillard, G., Birkeck, Neil, Zhang, J., Strand, R., Malmberg, F., Ou, Y., Davatzikos, C., Kirschner, M., Jung, F., Yuan, J., Qiu, W., Gao, Q., Edwards, P., Maan, B., van-der-Heijden, F., Ghose, S., Mitra, J., Dowling, J., Barratt, D., Huisman, H., Madabhushi, A., 2013. Evaluation of prostate segmentation algorithms for MRI: the PROMISE12 challenge. Medical Image Analysis 18 (2), 359–373. Malmberg, F., Strand, R., Kullberg, J., Nordenskjöld, R., Bengtsson, E., 2012. Smart Paint—A new interactive segmentation method applied to MR prostate segmentation. In: Prostate MR. Image Segmentation Grand Challenge (PROMISE'12), A MICCAI 2012 Workshop.

235

Mathalon, D.H., Sullivan, E.V., Rawles, J.M., Pfefferbaum, A., 1993. Correction for head size in brain-imaging measurements. Psychiatry Research: Neuroimaging 50 (2), 121–139. Murphy, D.G., DeCarli, C., Mclntosh, A.R., Daly, E., Mentis, M.J., Pietrini, P., Szczepanik, J., Schapiro, M.B., Grady, C.L., Horwitz, B., Rapoport, S.I, 1996. Sex differences in human brain morphometry and metabolism: an in vivo quantitative magnetic resonance imaging and positron emission tomography study on the effect of aging. Archives of General Psychiatry 53 (7), 585. Nordenskjöld, R., Malmberg, F., Larsson, E.-M., Simmons, A., Brooks, S.J., Lind, L., Ahlström, H., Johansson, L., Kullberg, J., 2013. Intracranial volume estimated with commonly used methods could introduce bias in studies including brain volume measurements. NeuroImage 83, 355–360. O'Brien, L.M., Ziegler, D.A., Deutsch, C.K., Frazier, J.A., Herbert, M.R., Locascio, J.J., 2011. Statistical adjustments for brain size in volumetric neuroimaging studies: some practical implications in methods. Psychiatry Research: Neuroimaging 193 (2), 113–122. Pengas, G., Pereira, J.M., Williams, G.B., Nestor, P.J., 2009. Comparative reliability of total intracranial volume estimation methods and the influence of atrophy in a longitudinal semantic dementia cohort. Journal of Neuroimaging 19 (1), 37–46. Sanfilipo, M.P., Benedict, R.H., Zivadinov, R., Bakshi, R., 2004. Correction for intracranial volume in analysis of whole brain atrophy in multiple sclerosis: the proportion vs. residual method. Neuroimage 22 (4), 1732–1743. Scahill, F.C., 2003. A longitudinal study of brain volume changes in normal aging using serial registered magnetic resonance imaging. Archives of Neurology 60 (July (7)), 989–994. Sullivan, E.V., Rosenbloom, M.J., Desmond, J.E., Pfefferbaum, A., 2001. Sex differences in corpus callosum size: relationship to age and intracranial size. Neurobiology of Aging 22 (4), 603–611. Whitwell, J.L., Crum, W.R., Watt, H.C., Fox, N.C., 2001. Normalization of cerebral volumes by use of intracranial volume: implications for longitudinal quantitative MR imaging. American Journal of Neuroradiology 22 (8), 1483–1489.