Multicentre variability of MRI-based medial temporal lobe volumetry in Alzheimer's disease

Multicentre variability of MRI-based medial temporal lobe volumetry in Alzheimer's disease

Psychiatry Research: Neuroimaging 182 (2010) 244–250 Contents lists available at ScienceDirect Psychiatry Research: Neuroimaging j o u r n a l h o m...

338KB Sizes 0 Downloads 15 Views

Psychiatry Research: Neuroimaging 182 (2010) 244–250

Contents lists available at ScienceDirect

Psychiatry Research: Neuroimaging j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / p s yc h r e s n s

Multicentre variability of MRI-based medial temporal lobe volumetry in Alzheimer's disease Stefan J. Teipel a,b,⁎, Michael Ewers c,d, Stefanie Wolf e, Frank Jessen f, Heike Kölsch f, Sönke Arlt g, Christian Luckhaus h, Peter Schönknecht i,j, Klaus Schmidtke k, Isabella Heuser l, Lutz Frölich m, Gabriele Ende m, Johannes Pantel n, Jens Wiltfang o, Fabian Rakebrandt p, Oliver Peters l, Christine Born q, Johannes Kornhuber r, Harald Hampel c,d a

Department of Psychiatry, University Rostock, Germany Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE), Germany c Department of Psychiatry, Ludwig-Maximilian University Munich, Germany d Discipline of Psychiatry, School of Medicine, Trinity College Dublin, Ireland e Department of Psychiatry, University Göttingen, Germany f Department of Psychiatry, University Bonn, Germany g Department of Psychiatry, University of Hamburg, Germany h Department of Psychiatry, University of Düsseldorf, Germany i Department of Psychiatry, University of Heidelberg, Germany j Department of Psychiatry, University of Leipzig, Germany k Department of Psychiatry, University of Freiburg, Germany l Department of Psychiatry, Charité-University Medicine Berlin, Germany m Central Institute of Mental Health, Mannheim, Germany n Department of Psychiatry, University of Frankfurt, Germany o Department of Psychiatry, University of Essen, Germany p Department of Medical Informatics, Georg August University, Göttingen, Germany q Department of Clinical Radiology, Ludwig-Maximilian University, Munich, Germany r Department of Psychiatry, University of Erlangen, Erlangen, Germany b

a r t i c l e

i n f o

Article history: Received 12 January 2009 Received in revised form 16 February 2010 Accepted 11 March 2010 Keywords: Alzheimer's disease Mild cognitive impairment Cognition ApoE 4 genotype Multicentre trial Mixed effects regression

a b s t r a c t Magnetic resonance imaging (MRI)-based volumetry of medial temporal lobe regions is among the best established biomarker candidates of Alzheimer's disease (AD) to date. This study assessed the effect of multicentre variability of MRI-based hippocampus and amygdala volumetry on the discrimination between patients with Alzheimer's disease (AD) and mild cognitive impairment (MCI) and on the association of morphological changes with ApoE4 genotype and cognition. We studied 113 patients with clinically probable AD and 150 patients with amnestic MCI using high-resolution MRI scans obtained at 12 clinical sites. We determined effect sizes of group discrimination and random effects linear models, considering multicentre variability. Hippocampus and amygdala volumes were significantly reduced in AD compared with MCI patients using data pooled across centres. Multicentre variability did not significantly affect the power to detect a volume difference between AD and MCI patients. Among cognitive measures, delayed recall of verbal and non-verbal material was significantly correlated with hippocampus and amygdala volumes. Amygdala and hippocampus volumes were not associated with ApoE4 genotype in AD or MCI. Our data indicate that multicentre acquisition of MRI data using manual volumetry is reliable and feasible for crosssectional diagnostic studies, and they replicate essential findings from smaller scale monocentre studies. © 2010 Published by Elsevier Ireland Ltd.

1. Introduction MRI-based volumetry of medial temporal lobe regions is among the best established biomarker candidates of Alzheimer's disease (AD) ⁎ Corresponding author. Department of Psychiatry, University of Rostock, Gehlsheimer Str. 20, 18147 Rostock, Germany. Tel.: +49 381 494 9610; fax: +49 381 494 9682. E-mail address: [email protected] (S.J. Teipel). 0925-4927/$ – see front matter © 2010 Published by Elsevier Ireland Ltd. doi:10.1016/j.pscychresns.2010.03.003

to date (Chetelat and Baron, 2003; Devanand et al., 2007; Golebiowski et al., 1999; Hampel et al., 2002; Killiany et al., 2000; Lopez et al., 2007; Teipel et al., 2006). It remains to be shown if acquisition of magnetic resonance imaging (MRI) data derived from multiple scanners increases variability of volumetric measures and how this would affect power calculations for a diagnostic trial. Multicentre acquisitions of MRI data provide large samples to study the association between regional atrophy and specific cognitive

S.J. Teipel et al. / Psychiatry Research: Neuroimaging 182 (2010) 244–250

impairment or genotypes. These data require the use of random effect models for the statistical analysis to control for the hierarchical covariance structure of multicentre data associated with higher variance between than within centres (Brown and Prescott, 1999). However, these models have only rarely been employed in MRI multicentre designs so far. Our aim was to determine multicentre variability and its effect on effect size estimates using bootstrapping to determine the confidence interval for a multicentre effect size based on high-resolution MRI data obtained at 12 clinical sites. Additionally, we determined effects of cognitive impairment and ApoE genotype on hippocampus and amygdala volumes using random effects analysis to explicitly model variation in effects across centres. 2. Methods 2.1. Subjects The study included a total of 975 patients with mild cognitive impairment (MCI) or different types of dementia associated with a large range of etiologies that had prospectively been included in the first wave of the diagnostic study of the German Dementia Network between August 2003 and December 2005. From these patients we selected the first series of 400 patients whose scans had been measured by four raters (100 scans per rater) where diagnoses and centres were balanced across raters. Out of these 400 subjects, we selected only those patients who met either National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer's Disease and Related Disorders Association criteria for the clinical diagnosis of probable AD (McKhann et al., 1984) (n = 113) or Mayo Clinic criteria for the clinical diagnosis of single- or multipledomain amnestic mild cognitive impairment (Petersen et al., 2001) (n = 150). All patient clinical records were available through an electronic database to a steering committee to ensure by unanimous approval that they met all inclusion criteria and no exclusion criteria. The AD and MCI groups differed significantly in age, but not in gender distribution and years of education (Table 1). Severity of cognitive impairment was assessed using the Mini-Mental State Examination score (Folstein et al., 1975). As expected, groups differed significantly in MMSE scores (Table 1). ApoE genotype was available in a subgroup of 88 AD patients and 122 MCI patients (Table 1). Patients were recruited from 12 centres across Germany, the number of subjects per centre and diagnostic group is listed in Supplementary Table 2. The distribution of sample size and diagnoses Table 1 Subjects' characteristics.

a

No. of patients (women) Age (S.D.) in yearsb MMSE (S.D.)c Years of education (S.D.)d No. of patients (women) with ApoE genotypee Age (S.D.) in yearsf MMSE (S.D.)g Years of education (S.D.)h a

AD

MCI

113 (68) 71.9 (7.7) 23.5 (4.1) 12.0 (3.5) 88 (51) 71.7 (7.3) 23.9 (3.3) 12.2 (3.5)

150 67.7 27.1 11.8 122 67.6 27.1 11.7

(71) (7.8) (1.9) (2.7) (56) (7.4) (1.9) (2.7)

Significantly different between groups, χ2 = 4.3, 1 df, P = 0.039. Significantly different between groups, T = − 4.35, 261 df, P b 0.001. c Significantly different between groups, Mann–Whitney U 3180.5, P b 0.001. d Not significantly different between groups, T = − 0.49, 261 df, P = 0.63. e Not different between subgroup with ApoE genotype and remaining subjects: AD: gender: χ2 = 0.82, 1 df, P = 0.37; MCI: gender: χ2 = 0.54, 1 df, P = 0.46. f Not different between subgroup and remaining subjects: AD: age: T = 0.45, 111 df, P = 0.66; MCI: age: T = 0.39, 148 df, P = 0.70. g Not different between subgroup and remaining subjects: AD: Mann–Whitney U = 1014, P = 0.55; MCI Mann–Whitney U 1592.5, P = 0.57. h Not different between subgroup and remaining subjects: AD: T = − 1.31, 111 df, P = 0.19; MCI: T = 1.34, 148 df, P = 0.18. b

245

across centres was not homogeneous, with χ2 = 24.5, 11 df, P = 0.01. The study was approved by the Central IRB Board of the Network located at the University of Mannheim, Germany, and by each of the local IRBs of the participating centres. 2.2. Psychometric testing Psychometric testing included the MMSE score (Folstein et al., 1975), immediate and delayed recall of the Wechsler Memory revised version logical memory (Wechsler, 1945), word list learning and free recall of word lists as well as drawing of increasingly complex figures and free recall of drawings from the Consortium to Establish a Registry for Alzheimer's Disease (CERAD) cognitive battery (Morris et al., 1989), the Alzheimer Disease Assessment Scale (ADAS) (Rosen et al., 1984) and the clinical dementia rating (CDR) (Morris, 1993). Due to low variability in global CDR scores (0.5 in MCI and 0.5 to 1.0 in AD patients), we only used the CDR sum of boxes for correlation analyses. All tests were applied by psychometricians who had successfully completed an annually repeated central rater training. 2.3. MRI acquisitions MRI scans were obtained on a 1.5 Tesla scanner in all centres using Siemens Sonata or Siemens Magnetom Vision scanners at nine centres and Philips scanners (Philips Gyroscan and Philips Intera) at the remaining three centres. All centres used a sagittal magnetization prepared rapid gradient echo (MPRAGE sequence) on Siemens Scanners and a 3D fast T1-weighted gradient echo sequence on Philips Scanners with isotropic spatial resolution of 1 mm3. Among all centres the Repetition Time (TR) varied between 9.3 and 20 ms and Echo Time (TE) between 3.93 and 4.38 ms (Ewers et al., 2006). All scanners had passed the phantom test of the American College of Radiology MRI Accreditation Program (ACR, 2000) both at initiation (Ewers et al., 2006) and at the end of the data acquisition period. 2.4. Volumetry Four raters blinded to subjects' identity, clinical diagnosis and left/ right orientation of scans conducted the volumetry. The number of scans per rater was balanced for centre and diagnosis. Prior to volumetric measurements, all MRI volumes were corrected for image intensity non-uniformities (Sled et al., 1998), mapped by linear stereotaxic transformation (Collins et al., 1994) into coordinates based on the Talairach atlas (Talairach and Tournoux, 1988), and resampled onto a 1-mm voxel grid. The correction for image intensity has been proved to recover most of the artifacts present in MR images (Sled et al., 1998). These preprocessing steps reduce inter-scan variability due to scan artifacts and correct for effects of whole brain atrophy. Volumetric analyses were performed with the interactive software package DISPLAY developed at the McConnell Brain Imaging Center at the Montreal Neurologic Institute. This program allows simultaneous visualization and segmentation of volumes in coronal, sagittal and horizontal orientations. The anatomical boundaries used for segmentation of hippocampus and amygdala have been described in detail in a previous study (Pruessner et al., 2000). In short, the hippocampus comprised the hippocampus head (HH), the hippocampus body (HB) and the hippocampus tail (HT). The HT was defined as encompassing the dentate gyrus, the cornus ammonis (CA), the part of the fascicular gyrus adjacent to the CA regions, the alveus and the fimbria. The most posterior part of the HT was defined as the first appearance (coming from posterior in coronal sections) of an ovoid mass of grey matter inferiomedially to the trigone of the lateral ventricle (TVL). The lateral border of the HT was the TVL; the medial border was defined by the transition from grey to white matter. The HT was separated from the Andreas–Retzius gyrus by a vertical line drawn from the medial tip of the TVL to the parahippocampal gyrus; the separation

246

S.J. Teipel et al. / Psychiatry Research: Neuroimaging 182 (2010) 244–250

from the crus of the fornix was defined by a horizontal line drawn from the quadrigeminal cistern to the TVL (in the coronal section). The inferior border of the HT was defined by the transition from grey to white matter. The HB was defined as encompassing subiculum, CA regions, dentate gyrus and fimbria. The superior border of the HB was defined by the fimbria white matter, most easily identified on sagittal sections. Separating at the inferior-medial border the subiculum of the HB from the entorhinal cortex required a geometrically defined line drawn from the most inferior point of the HB medially to the cistern. The lateral border was defined by the inferior horn of the lateral ventricle or the caudally adjacent white matter. Identification of the HH in its superior part (where it blends with the amygdalon and basal ganglia grey matter) was based on the uncal recess of the inferior horn of the lateral ventricle. The posterior end of the amygdalon was defined in the coronal plane at the point where grey matter first started to appear superior to the alveus and laterally to the HH. The amygdalon was separated at its superior border from the putamen and claustrum grey matter by a horizontal line (in the coronal view) drawn between the superolateral part of the optic tract and the fundus of the inferior portion of the circular sulcus of the insula. Medially, in its anterior and inferior parts the amygdalon has to be separated from the entorhinal cortex. If the entorhinal cortex could not consistently be identified, then an arbitrary border was used based on a semicircle between the lateral end of the lateral ventricle and the alveus. The lateral aspect of this semicircle served as the lateral border of the amygdalon. The anterior border of the amygdalon was defined at the level of the closure of the lateral sulcus. In summary, this approach uses anatomical landmarks wherever they are consistently visible and geometrically defined borders where the grey matter intensity does not allow for separating different structures, in order to achieve high consistency between scans and raters. 2.5. Statistics We conducted the following three blocks of analysis: 1. Random effects analysis to determine effects of diagnosis, age, gender, ApoE4 genotype and cognition on volumes using centre and raters as random covariates; 2. Effect size analysis to determine the effect of multicentre acquisition on effect size estimates; 3. Analysis of diagnostic accuracy using areas under receiver operating characteristic curves to assess effects of multicentre acquisition on a standard test of diagnostic accuracy.

2.5.1. Random effects model As the data were clustered according to centres and raters, we employed a random effects analysis (Teipel et al., 2007). This analysis assumes that the centres and raters in this study represent a random sample so that the findings of our study can be generalized to the set of all possible raters and centres (Goldstein, 1995). All models used the volumetric measures (in native space and after correcting for brain volume) as dependent variables. We determined the effects of diagnosis, age and gender as independent fixed effects on the volumetric measures after controlling for the random effects of rater and centre as well as the two-way interactions of rater and centre with diagnosis, gender and age as independent random effects. In a second set of analyses, we determined the effects of ApoE4 genotype (at least one ApoE4 allele vs. no ApoE4 allele) and of cognitive measures on volumes. Similar to the previous analysis, these effects were controlled for the effects of centre and rater and the interaction effects between centres and raters and ApoE4 genotype and cognitive measures as random effects, as well as for age, gender, diagnosis and the interaction of diagnosis with cognitive scores and ApoE4 genotype

as fixed effects. Models were calculated using Procedure MIXED in SAS 9.1 (Littell et al., 1996). To determine the relative contribution of centre to the variability of the volumetric measurements, we used variance component analysis with volumetric measurements as dependent variables, centre as random factor, diagnosis and raters as fixed factors, and age and gender as covariates using restricted maximum likelihood estimation with procedure VARCOMP in SAS 9.1. 2.5.2. Effect size analysis The effect size d of the difference in the volumes of hippocampus and amygdala between AD and MCI patients was estimated according to the following formula (Cohen, 1977): d=

ðx−yÞ ; δxy

where (x − y) is the mean difference in the hippocampus or amygdala volumes between AD and MCI patients, and δxy is the pooled standard deviation of volumes of the hippocampus and amygdala, respectively, obtained from AD and MCI patients. The corresponding effect sizes were determined in the following three different ways. 1. Effect sizes were calculated from the volumetric data of each centre separately (henceforth, referred to as “monocentre effect size”). 2. The effect size was determined after the hippocampus and amygdala volumes had been pooled across centres so that centre effects were no longer considered (henceforth, referred to as “pooled effect size”). 3. We defined a multicentre effect size that considered the effect of multicentre acquisition. For the multicentre effect size, a random sample of n = 10 data points (volumes of the left hippocampus) was drawn each from the MCI group (n = 150) and the AD group (n = 113) after the data had been pooled across centres. Then the effect size of the random sample (n = 10 AD and n = 10 MCI) was calculated. This procedure of drawing a random sample of 10 volumes from the AD group and 10 volumes from the MCI group and the calculation of the respective effect size from this random sample was repeated 10,000 times. The mean effect size of the resulting sampling distribution along with the 95% confidence interval was calculated using a customized program written in MATLAB 7.01. The procedure to calculate the pooled and the multicentre effect sizes is illustrated in Fig. 1. A one-sample t-test was used to test whether the monocentre effect sizes deviated significantly from the mean value of the multicentre effect size. The sample size required to detect a pooled effect size with a statistical power of 0.9, at P = 0.05 for a two-tailed t-test, was estimated using the computer program G*POWER 3 (Erdfelder et al., 1996). 2.5.3. ROC analysis To determine diagnostic accuracy of volumes between AD patients and MCI patients, we calculated the area under the receiver operating characteristic (ROC) curve (AUC), where an AUC of 0.5 indicated random discrimination, and an AUC of 1 indicated perfect discrimination (Hanley and McNeil, 1982). We compared the AUC calculated for volumes pooled across centres (henceforth, referred to as “pooled AUC”) with the AUC from 10 data points randomly sampled each from the MCI group and AD group, where the AUC calculations were iterated 10,000 times (henceforth, referred to as “multicentre AUC”), using a customized program written in MATLAB 7.01. 3. Results The relative error for each pairwise comparison between the four raters was below 5% for all volumetric measures obtained from 10 randomly selected MRI scans that were measured independently by

S.J. Teipel et al. / Psychiatry Research: Neuroimaging 182 (2010) 244–250

247

Fig. 1. Calculation of the pooled and the multicentre effect sizes. From the entire set of volume measurements that have been acquired within each centre, the effect size for the volume difference between groups has been calculated in two different ways. First, the data have been pooled across centres and the effect size between groups has been calculated (pooled effect size, lower left of the figure). Second, a random sample (10 volumetric values each for the AD and MCI group, respectively) has been drawn from the entire set of data. The effect size for this random sample has been calculated. This process has been repeated 10,000 times, the resulting 10,000 effect sizes have been averaged to obtain a multicentre effect size (lower right of the figure).

the four raters. The intra-class correlation coefficients across the four raters were 0.92 for the right and 0.93 for the left hippocampus, and 0.69 for the right and 0.80 for the left amygdalon. In the mixed effects model, we observed significant effects of diagnosis on the bilateral hippocampus and the left amygdalon volumes (smaller volumes in AD patients). In addition, age was associated with the bilateral hippocampus (smaller volumes in older subjects), and gender with the right hippocampus and the bilateral amygdalon volumes (smaller volumes in women) (Table 2). When controlling for overall brain volume, effects of age and diagnosis remained unchanged, but effects of gender were no longer significant. Variance component analysis showed that centre accounted for 12.6% of variance in the left and 16.2% in the right hippocampus, and for 0.7% of variance in the left and 3.2% of variance in the right amygdalon measurements. Using mixed effects regression, we determined the effect of cognitive performance and ApoE genotype on volumes. There was a significant effect of free recall of verbal material, and immediate and delayed logical

Table 2 Analysis of variance on the effects of diagnosis, age and gender on volumetric measures controlling for centre and rater effects. Effect

Diagnosis Age Gender

l. hippocampus

l. amygdala

r. hippocampus

r. amygdala

F1259

P

F1259

P

F1259

P

F1259

P

27.62 20.26 1.66

0.0001 0.0001 n.s.

6.26 0.58 14.26

0.02 n.s. 0.001

23.63 9.58 4.65

0.001 0.003 0.04

2.67 0.01 16.56

n.s. n.s. 0.001

F-tests (1 denominator and 259 numerator degrees of freedom) of fixed effects from a mixed effects model with age, gender and diagnosis as fixed effects and rater and centre and the two-way interactions between rater and centre on the one hand and age, gender and diagnosis on the other hand as random effects. l. — left. r. — right. n.s. — not significant.

memory on all volumes (Table 3). Recall of previously drawn figures was significantly correlated with the bilateral hippocampus and the right amygdalon volume, whereas drawing itself was not correlated with any volumetric measure. Among global cognitive measures, MMSE score was significantly correlated with all volumetric measures, ADAScog sum score was associated with the left hippocampus and the bilateral amygdalon volumes, and CDR sum of boxes score was related to the bilateral hippocampus and the right amygdalon volumes (Table 3). There was no significant effect of ApoE4 carrier status on any volumetric measure (Table 3). When we controlled the models for change in global brain volume, the significance of effects remained unchanged. The means of the sampling distribution of the multicentre effect sizes and the pooled effect sizes were nearly identical for all volumes (Supplementary Figs. 1 to 4). Effect sizes ranged between −0.87 for the left hippocampus and −0.52 for the right amygdalon. The total sample sizes required to detect a significant difference between groups at P b 0.05 with 90% power ranged between 58 for the left hippocampus and 158 for the right amygdala (Supplementary Table 1). Mean volumes of hippocampus and amygdala in AD and MCI patients and the monocentre effect sizes of group differences for each centre are displayed in Supplementary Table 2. For the left hippocampus, two monocentre effect sizes (centres I and XII) and for the left amygdalon three monocentre effect sizes (centres V, VII, and XII) were outside the 95% confidence interval of the multicentre sampling distribution (Fig. 1 and Supplementary Table 2). All centres for which outliers of measurements were observed had sample sizes below six subjects in at least one diagnostic group. A one-sample t-test confirmed that the monocentre effect sizes did not differ in either direction from the mean multicentre effect size (P = 0.38 or larger for all comparisons). The areas under the ROC curves for the discrimination between AD and MCI patients were nearly identical between the pooled data and the iterated random sampling. The mean of the AUC ranged between 0.78 for the right hippocampus and 0.65 for the right amygdalon (Supplementary Table 1).

248

S.J. Teipel et al. / Psychiatry Research: Neuroimaging 182 (2010) 244–250

Table 3 Analysis of variance on the effects of cognitive measures and ApoE genotype on volumetric measures controlling for age, diagnosis, gender, centre and rater effects. Effect

Word learning Free recall words WMS-RLM-I WMS-RLM-D Drawing Free recall drawings MMSE ADAScog CDR-SB

l. hippocampus

l. amygdalon

r. hippocampus

F1257

P

F1257

F1257

1.83 17.45

n.s. 0.0001

3.68 4.49

0.06 0.035

1.96 9.74

n.s. 0.002

4.82 7.64

0.03 0.006

11.67 21.53 0.01 10.39

0.0007 0.0001 n.s. 0.002

3.91 1.42 0.09 2.41

0.05 n.s. n.s. n.s.

8.24 9.42 0.03 11.86

0.005 0.0024 n.s. 0.0007

5.24 9.21 0.36 5.70

0.023 0.003 n.s. 0.02

7.79 6.36 4.74

0.006 0.02 0.03

13.94 9.32 2.72

0.0002 0.0025. n.s.

7.16 2.95 4.06

0.008 n.s. 0.05

13.90 14.06 5.52

0.0002 0.0002 0.02

P

F1204

P

P

F1204

P

F1204 ApoE4

0.05

n.s.

0.10

P

n.s.

F1204 0.21

r. amygdalon

P

n.s.

P

0.70

n.s.

F-tests of fixed effects from mixed effects models with cognitive measures/ApoE4 genotype, age, gender, diagnosis and interaction of diagnosis and cognitive measures/ ApoE4 genotype as fixed effects and rater and centre and the two-way interactions between rater and centre on the one hand and cognitive measures/ApoE4 genotype on the other hand as random effects. WMS_RLM-I — Wechsler Memory revised logical memory test-immediate recall. WMS_RLM-D — Wechsler Memory revised logical memory test-delayed recall. CDR-SB — clinical dementia rating, sum of boxes. MMSE — Mini-Mental State Examination. ADAScog — Alzheimer Disease Assessment Scale cognitive battery. l. — left. r. — right. n.s. — not significant.

4. Discussion In a multicentre setting, we found significant reductions of the volumes of hippocampus and amygdala in AD compared with MCI patients. This finding agrees with previous monocentre studies on hippocampus and amygdala volume reductions in AD compared with MCI patients (Bottino et al., 2002; Pennanen et al., 2004). Similar to previous studies (Apostolova et al., 2006; Jack et al., 1997; Pruessner et al., 2001), we found smaller regional volumes of the hippocampus and amygdala in women than in men. This effect, however, was conditional upon correcting for overall brain size. Therefore, gender seems not to have a region-specific effect on medial temporal lobe volumes in MCI or AD. Centre accounted for 13% to 16% of variance in hippocampus, but only 1% to 3% of variance in amygdalon measurements. This difference was only partially explained by a higher amount of variance bound by rater effects with amygdalon (about 10%) than hippocampus (about 3 to 5%). We can only speculate on the possible basis of this difference. One could assume that susceptibility artifacts are less pronounced for the amygdala than the hippocampus related to different distances of both structures from the skull base. The centre effects prevailed even though we had used a strict selection process of scanners based on the phantom test of the American College of Radiology MRI Accreditation Program (ACR, 2000) whose criteria had to be met both at initiation (Ewers et al., 2006) and at the end of the study. Additionally, all centres obtained similar anatomical sequences with an identical spatial resolution, however, scanners differed in respect to manufacturers' and hardware and software specifications so that parameters such as TR and TE varied between scanners. Despite these effects, the pooled effect size and the average of the sampling distribution of the multicentre effect size were nearly identical. Moreover, the large majority of monocentre effect sizes were within the 95% confidence limits of the multicentre effect sizes. This suggests that the between-centre variability was not large enough to mask biological effects. Our findings are consistent with previous findings of a lack of difference between the effect sizes derived from monocentre vs. multicentre MRI scans when comparing total grey matter volume between MCI

and AD (Ewers et al., 2006). In agreement with these results, the areas under the ROC curves were nearly identical for the multicentre and the pooled data analysis. For the interpretation of the findings, one has to consider that the estimation of the monocentre effect sizes was based on a much smaller sample size per centre. The few effect sizes that lay outside the 95% confidence interval of the sampling distribution of the multicentre effect size were associated with those centres that had the smallest sample sizes. Thus, it is possible that the rather large variability within centres due to smaller sample sizes may have masked variability between centres. This problem is partially offset by the use of mixed effects regression with restricted maximum likelihood estimation (Laird and Ware, 1982) in our study that weights centre effects by sample size such that observations with fewer data points contribute less to the overall estimate of centre effects than centres with larger sample size. We have not found an effect of ApoE4 genotype on hippocampus and amygdala volumes in AD or MCI. Findings from earlier studies are ambiguous (Hashimoto et al., 2001; Jack et al., 1998; Lehtovirta et al., 1996). It has been argued that subject numbers had been too low in the negative studies. Cohen's effect size index d was 0.55 in one relatively large positive study comprising 138 patients (Hashimoto et al., 2001). Our study provided the statistical power sufficient to detect an effect size of 0.55 at 80% power and a P value of 0.05 in the AD group, but Cohen's d between 0.11 and 0.35 the ApoE effect size in our sample was much smaller than this value. One may assume that sample characteristics, such as age, gender or cognitive status may contribute to discrepancies between studies. However, age, gender and MMSE scores were matched between the AD patients with and without ApoE4 genotype both in our sample and in the previous study showing a significant effect of ApoE4 (Hashimoto et al., 2001). Similar to the current study, one earlier study found no effect of ApoE4 genotype on amygdala volume (Lehtovirta et al., 1996) in AD patients. However, two studies found significantly larger atrophy of amygdala in ApoE4 carriers compared to non-carriers (Basso et al., 2006a,b; Hashimoto et al., 2001). Controversial findings have been reported on the effect of ApoE4 on hippocampus volumes in MCI, as well. One large multicentre trial included 267 MCI patients from 65 sites in 14 countries found highly significant effects of ApoE4 genotype on hippocampus volumes. However, the authors did not consider centre effects at all (Farlow et al., 2004), rendering the results difficult to interpret. This study did not report data to calculate effect sizes. Another multicentre study without considering centre effects found significantly smaller hippocampus volumes in women with heterozygote ApoE4 genotype, but not in men with this genotype (Fleisher et al., 2005). In our group there was no gender by ApoE4 genotype interaction and no gene dose effect of ApoE4 genotype. Within a group of MCI patients who remained stable during clinical follow-up, the ApoE4 carriers had atrophy in the amygdala and hippocampus compared to the non-carriers, whereas in 13 MCI patients who had converted to dementia at clinical follow-up, the ApoE4 carriers showed more atrophy in frontal and parietal association cortex but not the hippocampus compared to the noncarriers (Hamalainen et al., 2008). Our data cannot resolve the question of an ApoE4 effect on hippocampus volumes in AD and MCI patients, however, they advocate the use of random effect models for future multicentre trials, even if scanning parameters have been homogenized across centres. We found significant correlations between free recall of verbal material and drawings as well as immediate and delayed recall of logical memory with the bilateral hippocampus and amygdalon volumes. In contrast, verbal learning was not related to hippocampus volume, and drawing performance was not related to hippocampus and amygdalon volumes. The global cognitive scales (MMSE, ADAScog and CDR sum of boxes) were correlated with the volumetric measures. These results were independent of the determinant of the affine transformation matrix as a global measure of brain size.

S.J. Teipel et al. / Psychiatry Research: Neuroimaging 182 (2010) 244–250

Our findings suggest a specific correlation between free recall of verbal and non-verbal material with hippocampus and amygdalon volumes. They agree with several studies showing correlations of memory related cognitive measures with hippocampus (Petersen et al., 2000) and amygdala (Basso et al., 2006a, b; Mizuno et al., 2000; Mori et al., 1997) volumes. In agreement with earlier studies (Kohler et al., 1998; Kramer et al., 2004), we found significant correlations of hippocampus volumes with delayed free recall, but not with immediate memory of a word list, supporting the specific role of hippocampus in the consolidation rather than encoding of new material (Squire, 1992). Interestingly, however, not only delayed, but also immediate logical memory was correlated with hippocampal and amygdalon volumes. This agrees with a study in patients with intractable temporal lobe epilepsy. Those patients having a focus in the left medial temporal lobe showed both impaired immediate and delayed logical memory (Moore and Baker, 1996). However, when we assessed a linear model including both immediate and delayed logical memory, only delayed, but not immediate memory was significantly correlated with hippocampus volumes. Therefore, the effects of logical memory seem to be dominated by delayed recall. The significant correlation between recall of drawings independently of the immediate drawing performance agrees with the prominent role of the hippocampus in the consolidation of a memory. The main focus of research on diagnostic markers is on the separation of AD patients from healthy subjects and on the discrimination between MCI patients that convert to AD and MCI patients that remain stable. For a study aiming to differentiate between MCI and AD patients at the interface between functionally independent and dependent living in the community, it will be important to determine whether multicentre MRI captures essential features of between group differences in brain morphology. Our data indicate that multicentre acquisition of MRI does not strongly affect volumetric measures and their effect sizes, supporting the approach of ongoing national and international networks to collect multicentre MRI data to establish MRI as a reliable biomarker of AD. Acknowledgements The study was funded by a grant from the Bundesministeriums für Bildung und Forschung (BMBF 01 GI 0102) awarded to the dementia network “Kompetenznetz Demenzen”. The study was approved by the IRB Board of the leading centre of the German Dementia Network located at the University of Mannheim and by each of the local IRBs of the participating centres. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi: 10.1016/j.pscychresns.2010.03.003. References ACR, 2000. In: Reston, V. (Ed.), Phantom Test Guidance for the ACR MRI Accreditation Program. American College of Radiology. Apostolova, L.G., Dinov, I.D., Dutton, R.A., Hayashi, K.M., Toga, A.W., Cummings, J.L., Thompson, P.M., 2006. 3D comparison of hippocampal atrophy in amnestic mild cognitive impairment and Alzheimer's disease. Brain. 129 (Pt 11), 2867–2873. Basso, M., Gelernter, J., Yang, J., MacAvoy, M.G., Varma, P., Bronen, R.A., van Dyck, C.H., 2006a. Apolipoprotein E epsilon4 is associated with atrophy of the amygdala in Alzheimer's disease. Neurobiology of Aging 27 (10), 1416–1424. Basso, M., Yang, J., Warren, L., MacAvoy, M.G., Varma, P., Bronen, R.A., van Dyck, C.H., 2006b. Volumetry of amygdala and hippocampus and memory performance in Alzheimer's disease. Psychiatry Research: Neuroimaging 146 (3), 251–261. Bottino, C.M., Castro, C.C., Gomes, R.L., Buchpiguel, C.A., Marchetti, R.L., Neto, M.R., 2002. Volumetric MRI measurements can differentiate Alzheimer's disease, mild cognitive impairment, and normal aging. International Psychogeriatrics 14 (1), 59–72. Brown, H., Prescott, R., 1999. Applied Mixed Models in Medicine. John Wiley & Sons Ltd, Chichester, UK. Chetelat, G., Baron, J.C., 2003. Early diagnosis of Alzheimer's disease: contribution of structural neuroimaging. Neuroimage 18 (2), 525–541.

249

Cohen, J., 1977. Statistical Power Analysis for the Behavioural Sciences. Academic Press, New York. Collins, D.L., Neelin, P., Peters, T.M., Evans, A.C., 1994. Automatic 3D intersubject registration of MR volumetric data in standardized Talairach space. Journal of Computer Assisted Tomography 18 (2), 192–205. Devanand, D.P., Pradhaban, G., Liu, X., Khandji, A., De Santi, S., Segal, S., Rusinek, H., Pelton, G.H., Honig, L.S., Mayeux, R., Stern, Y., Tabert, M.H., de Leon, M.J., 2007. Hippocampal and entorhinal atrophy in mild cognitive impairment: prediction of Alzheimer disease. Neurology 68 (11), 828–836. Erdfelder, E., Faul, F., Buchner, A., 1996. GPOWER: a general power analysis program. Behavior Research Methods, Instruments & Computers. 28, 1–11. Ewers, M., Teipel, S.J., Dietrich, O., Schonberg, S.O., Jessen, F., Heun, R., Scheltens, P., van de Pol, L., Freymann, N.R., Moeller, H.J., Hampel, H., 2006. Multicenter assessment of reliability of cranial MRI. Neurobiology of Aging 27 (8), 1051–1059. Farlow, M.R., He, Y., Tekin, S., Xu, J., Lane, R., Charles, H.C., 2004. Impact of APOE in mild cognitive impairment. Neurology 63 (10), 1898–1901. Fleisher, A., Grundman, M., Jack Jr., C.R., Petersen, R.C., Taylor, C., Kim, H.T., Schiller, D.H., Bagwell, V., Sencakova, D., Weiner, M.F., DeCarli, C., DeKosky, S.T., van Dyck, C.H., Thal, L.J., 2005. Sex, apolipoprotein E epsilon 4 status, and hippocampal volume in mild cognitive impairment. Archives of Neurolology 62 (6), 953–957. Folstein, M.F., Folstein, S.E., McHugh, P.R., 1975. Mini-mental-state: a practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research 12189–12198. Goldstein, H., 1995. Multilevel Statistical Models, 2nd ed. Edward Arnold, London. Golebiowski, M., Barcikowska, M., Pfeffer, A., 1999. Magnetic resonance imaging-based hippocampal volumetry in patients with dementia of the Alzheimer type. Dementia and Geriatric Cognitive Disorders 10 (4), 284–288. Hamalainen, A., Grau-Olivares, M., Tervo, S., Niskanen, E., Pennanen, C., Huuskonen, J., Kivipelto, M., Hanninen, T., Tapiola, M., Vanhanen, M., Hallikainen, M., Helkala, E.L., Nissinen, A., Vanninen, R.L., Soininen, H., 2008. Apolipoprotein E epsilon 4 allele is associated with increased atrophy in progressive mild cognitive impairment: a voxel-based morphometric study. Neurodegenerative Diseases 5 (3–4), 186–189. Hampel, H., Teipel, S.J., Bayer, W., Alexander, G.E., Schwarz, R., Schapiro, M.B., Rapoport, S.I., Moller, H.J., 2002. Age transformation of combined hippocampus and amygdala volume improves diagnostic accuracy in Alzheimer's disease. Journal of the Neurological Sciences 194 (1), 15–19. Hanley, J.A., McNeil, B.J., 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 14329–14336. Hashimoto, M., Yasuda, M., Tanimukai, S., Matsui, M., Hirono, N., Kazui, H., Mori, E., 2001. Apolipoprotein E epsilon 4 and the pattern of regional brain atrophy in Alzheimer's disease. Neurology 57 (8), 1461–1466. Jack Jr., C.R., Petersen, R.C., Xu, Y.C., Waring, S.C., O'Brien, P.C., Tangalos, E.G., Smith, G.E., Ivnik, R.J., Kokmen, E., 1997. Medial temporal atrophy on MRI in normal aging and very mild Alzheimer's disease. Neurology 49, 786–794. Jack Jr., C.R., Petersen, R.C., Xu, Y.C., O'Brien, P.C., Waring, S.C., Tangalos, E.G., Smith, G.E., Ivnik, R.J., Thibodeau, S.N., Kokmen, E., 1998. Hippocampal atrophy and apolipoprotein E genotype are independently associated with Alzheimer's disease. Annals of Neurololgy 43 (3), 303–310. Killiany, R.J., Gomez-Isla, T., Moss, M., Kikinis, R., Sandor, T., Jolesz, F., Tanzi, R., Jones, K., Hyman, B.T., Albert, M.S., 2000. Use of structural magnetic resonance imaging to predict who will get Alzheimer's disease. Annals of Neurology 47 (4), 430–439. Kohler, S., Black, S.E., Sinden, M., Szekely, C., Kidron, D., Parker, J.L., Foster, J.K., Moscovitch, M., Winocour, G., Szalai, J.P., Bronskill, M.J., 1998. Memory impairments associated with hippocampal versus parahippocampal-gyrus atrophy: an MR volumetry study in Alzheimer's disease. Neuropsychologia 36 (9), 901–914. Kramer, J.H., Schuff, N., Reed, B.R., Mungas, D., Du, A.T., Rosen, H.J., Jagust, W.J., Miller, B.L., Weiner, M.W., Chui, H.C., 2004. Hippocampal volume and retention in Alzheimer's disease. Journal of the International Neuropsychology Society 10 (4), 639–643. Laird, N.M., Ware, J.H., 1982. Random-effects models for longitudinal data. Biometrics 38 (4), 963–974. Lehtovirta, M., Soininen, H., Laakso, M.P., Partanen, K., Helisalmi, S., Mannermaa, A., Ryynanen, M., Kuikka, J., Hartikainen, P., Riekkinen Sr., P.J., 1996. SPECT and MRI analysis in Alzheimer's disease: relation to apolipoprotein E epsilon 4 allele. Journal of Neurology, Neurosurgery, and Psychiatry 60 (6), 644–649. Littell, R.C., Milliken, G.A., Stroup, W.W., Wolfinger, R.D., 1996. SAS System for Mixed Models. SAS Institute Inc, Cary, NC, USA. Lopez, O.L., Kuller, L.H., Becker, J.T., Dulberg, C., Sweet, R.A., Gach, H.M., Dekosky, S.T., 2007. Incidence of dementia in mild cognitive impairment in the cardiovascular health study cognition study. Archives of Neurology 64 (3), 416–420. McKhann, G., Drachman, D., Folstein, M., Katzman, R., Price, D., Stadlan, E.M., 1984. Clinical diagnosis of Alzheimer's disease: report of the NINCDS-ADRDA Work Group under the auspices of the Department of Health and Human Services Task Force on Alzheimer's disease. Neurology 34, 939–944. Mizuno, K., Wakai, M., Takeda, A., Sobue, G., 2000. Medial temporal atrophy and memory impairment in early stage of Alzheimer's disease: an MRI volumetric and memory assessment study. Journal of the Neurological Sciences 173 (1), 18–24. Moore, P.M., Baker, G.A., 1996. Validation of the Wechsler Memory Scale-Revised in a sample of people with intractable temporal lobe epilepsy. Epilepsia. 37 (12), 1215–1220. Mori, E., Yoneda, Y., Yamashita, H., Hirono, N., Ikeda, M., Yamadori, A., 1997. Medial temporal structures relate to memory impairment in Alzheimer's disease: an MRI volumetric study. Journal of Neurology, Neurosurgery, and Psychiatry 63 (2), 214–221. Morris, J.C., 1993. The Clinical Dementia Rating (CDR): current version and scoring rules. Neurology 43 (11), 2412–2414. Morris, J.C., Heyman, A., Mohs, R.C., Hughes, J.P., van Belle, G., Fillenbaum, G., Mellits, E.D., Clark, C., 1989. The Consortium to Establish a Registry for Alzheimer's Disease

250

S.J. Teipel et al. / Psychiatry Research: Neuroimaging 182 (2010) 244–250

(CERAD). Part I. Clinical and neuropsychological assessment of Alzheimer's disease. Neurology 39 (9), 1159–1165. Pennanen, C., Kivipelto, M., Tuomainen, S., Hartikainen, P., Hanninen, T., Laakso, M.P., Hallikainen, M., Vanhanen, M., Nissinen, A., Helkala, E.L., Vainio, P., Vanninen, R., Partanen, K., Soininen, H., 2004. Hippocampus and entorhinal cortex in mild cognitive impairment and early AD. Neurobiology of Aging 25 (3), 303–310. Petersen, R.C., Jack Jr., C.R., Xu, Y.C., Waring, S.C., O'Brien, P.C., Smith, G.E., Ivnik, R.J., Tangalos, E.G., Boeve, B.F., Kokmen, E., 2000. Memory and MRI-based hippocampal volumes in aging and AD. Neurology 54 (3), 581–587. Petersen, R.C., Doody, R., Kurz, A., Mohs, R.C., Morris, J.C., Rabins, P.V., Ritchie, K., Rossor, M., Thal, L., Winblad, B., 2001. Current concepts in mild cognitive impairment. Archives of Neurology 58 (12), 1985–1992. Pruessner, J.C., Li, L.M., Serles, W., Pruessner, M., Collins, D.L., Kabani, N., Lupien, S., Evans, A.C., 2000. Volumetry of hippocampus and amygdala with high-resolution MRI and three-dimensional analysis software: minimizing the discrepancies between laboratories. Cerebral Cortex 10 (4), 433–442. Pruessner, J.C., Collins, D.L., Pruessner, M., Evans, A.C., 2001. Age and gender predict volume decline in the anterior and posterior hippocampus in early adulthood. Journal of Neuroscience 21 (1), 194–200.

Rosen, W.G., Mohs, R.C., Davis, K.L., 1984. A new rating scale for Alzheimer's disease. American Journal of Psychiatry 1411356–1411364. Sled, J.G., Zijdenbos, A.P., Evans, A.C., 1998. A nonparametric method for automatic correction of intensity nonuniformity in MRI data. IEEE Transactions on Medical Imaging 17 (1), 87–97. Squire, L.R., 1992. Memory and the hippocampus: a synthesis from findings with rats, monkeys, and humans. Psychological Review. 99195–99231. Talairach, J., Tournoux, P., 1988. Co-Planar Stereotaxic Atlas of the Human Brain. Thieme, New York. Teipel, S.J., Pruessner, J.C., Faltraco, F., Born, C., Rocha-Unold, M., Evans, A., Moller, H.J., Hampel, H., 2006. Comprehensive dissection of the medial temporal lobe in AD: measurement of hippocampus, amygdala, entorhinal, perirhinal and parahippocampal cortices using MRI. Journal of Neurology 253 (6), 794–800. Teipel, S.J., Mitchell, A., Möller, H.-J., Hampel, H., 2007. Improving linear modeling of cognitive decline in patients with mild cognitive impairment: comparison of two methods. Journal of Neural Transmission Supplement Suppl. 72241–72247. Wechsler, D., 1945. A standardized memory scale for clinical use. Journal of Psychology 19, 87–95.