Psychiatry Research: Neuroimaging Section 91 Ž1999. 31᎐44
Methodological issues in volumetric magnetic resonance imaging of the brain in the Edinburgh High Risk Project Heather C. Whalley U , Julia N. Kestelman, J. Ewen Rimmington, Andrew Kelso, Suheib S. Abukmeil, Jonathan J.K. Best, Eve C. Johnstone, Stephen M. Lawrie Department of Psychiatry, Royal Edinburgh Hospital, Morningside Park, Edinburgh EH10 5HF, UK Received 18 December 1998; received in revised form 17 March 1999; accepted 29 March 1999
Abstract The Edinburgh High Risk Project is a longitudinal study of brain structure Žand function. in subjects at high risk of developing schizophrenia in the next 5᎐10 years for genetic reasons. In this article we describe the methods of volumetric analysis of structural magnetic resonance images used in the study. We also consider potential sources of error in these methods: the validity of our image analysis techniques; inter- and intra-rater reliability; possible positional variation; and thresholding criteria used in separating brain from cerebro-spinal fluid ŽCSF.. Investigation with a phantom test object Žof similar imaging characteristics to the brain. provided evidence for the validity of our image acquisition and analysis techniques. Both inter- and intra-rater reliability were found to be good in whole brain measures but less so for smaller regions. There were no statistically significant differences in positioning across the three study groups Žpatients with schizophrenia, high risk subjects and normal volunteers.. A new technique for thresholding MRI scans longitudinally is described Žthe ‘rescale’ method. and compared with our established method Žthresholding by eye.. Few differences between the two techniques were seen at 3- and 6-month follow-up. These findings demonstrate the validity and reliability of the structural MRI analysis techniques used in the Edinburgh High Risk Project, and highlight methodological issues of general concern in cross-sectional and longitudinal studies of brain structure in healthy control subjects and neuropsychiatric populations. 䊚 1999 Elsevier Science Ireland Ltd. All rights reserved. Keywords: MRI; Structural imaging; Thresholding; Follow-up; Reliability; Validity; Schizophrenia
U
Corresponding author. Tel.: q44-131-537-6767. E-mail address:
[email protected] ŽH.C. Whalley.
0925-4927r99r$ - see front matter 䊚 1999 Elsevier Science Ireland Ltd. All rights reserved. PII: S 0 9 2 5 - 4 9 2 7 Ž 9 9 . 0 0 0 1 2 - 8
32
H.C. Whalley et al. r Psychiatry Research: Neuroimaging 91 (1999) 31᎐44
1. Introduction There is now little doubt that patients with schizophrenia, as a group, have structurally abnormal brains. Over 50 controlled computerised tomography ŽCT. studies have demonstrated enlarged lateral ventricles and a loss of brain substance ŽLewis, 1990; Raz and Raz, 1990; Daniel et al., 1991; van Horn and McManus, 1992., while a similar number of controlled magnetic resonance imaging ŽMRI. studies have demonstrated volume reduction in the whole brain and of mesial temporal lobe structures in particular ŽWard et al., 1996; Lawrie and Abukmeil, 1998; Nelson et al., 1998.. The remaining issues are how such findings are related to aetiological variables, such as genetic vulnerability, and to what extent the changes are degenerative Žprogressive. rather than developmental Žstatic.. The Edinburgh High Risk Project has been designed to address such issues. The main aim is to repeatedly assess a large number of young adults aged 16᎐25 who are at high risk of developing schizophrenia for genetic reasons and to observe the presence or absence of changes in brain structure as the illness develops and in the early stages of psychosis. The same assessments are being performed in appropriately matched normal control subjects and in cases of first episode schizophrenia. The first article to be published described the brain structure in the first 100 high risk subjects as compared to 30 healthy control subjects and 20 first episode schizophrenic patients who were matched for age, sex and paternal social class ŽLawrie et al., 1999.. Previous considerations of the causes of brain changes have reported fundamentally different results depending on the population studied. Relatives of patients with schizophrenia tend to have subtle structural changes, suggesting a genetic basis ŽWeinberger et al., 1981; Cannon et al., 1993; Seidman et al., 1997.. On the other hand, some studies suggest that ventriculomegaly is associated with birth trauma rather than a family history, thus suggesting an environmental basis ŽLewis, 1990; Vita et al., 1994.. These CT studies have suffered from technical limitations in resolution, difficulties in imaging parts of the brain and
the use of unreliable area measures, but very few relevant MRI studies have as yet been published. Studies of the time-scale of brain changes have delivered a more consistent picture. Several CT and MRI reports in first episode cases have demonstrated that similar changes are found at the onset of the disease as in cases who have been ill for many years ŽBogerts et al., 1990; Lewis, 1990; Bilder et al., 1994; Nopoulos et al., 1995; Lim et al., 1996.. Follow-up CT studies have mostly indicated that the enlargement of the lateral ventricles does not progress with increasing illness duration ŽNasrallah et al., 1986; Illowsky et al., 1988; Vita et al., 1988; Jaskiw et al., 1994., but two studies have shown a significant increase over time ŽKemali et al., 1989; Woods et al., 1990.. These CT studies used a ventricularrbrain ratio measurement calculated from one slice showing the lateral ventricles at their largest. Inevitable positional variation between scans gave rise to difficulty in reliably comparing the second scan to the first, and other methodological issues may account for some of the inconsistencies in the results. A more recent CT study has used volumetric measures of ventricular size to overcome some of these difficulties and has indicated that there are subgroups of subjects who display ventricular enlargement over time ŽDavis et al., 1998.. Long-term follow-up volumetric MRI studies do not suffer from such marked methodological problems. Several of these studies have indicated progressive changes in the brain after the onset of the illness, occurring either in subjects in general or in subgroups of subjects ŽDeLisi et al., 1997; Nair et al., 1997; Rapoport et al., 1997; Gur et al., 1998.. This article describes the methodology employed in the Edinburgh High Risk Project and highlights specific areas of concern. We do this by considering four of the main issues in image analysis relevant to both cross-sectional case-control comparisons and longitudinal within-subjects comparisons. These are the validity and reliability of the method, positional variation at scan acquisition, and thresholding. These are potential sources of measurement error that could obscure relatively subtle case-control differences and even smaller possible changes over time.
H.C. Whalley et al. r Psychiatry Research: Neuroimaging 91 (1999) 31᎐44
2. Method 2.1. Image acquisition All MRI brain images in the Edinburgh High Risk Project were performed on a 42 SPE Siemens ŽErlangen, Germany. Magnetom operating at 1.0 T. Midline sagittal localisation was followed by two sequences to image the whole brain. The first scan was a double spin echo sequence which gave simultaneous proton density and T2-weighted images ŽTRs 3565 ms, TEs 20 and 90 ms, 31 contiguous 5-mm slices in the Talairach plane, field of view 250 = 250 mm. to identify any gross brain lesions. The second scan was a threedimensional Magnetisation Prepared Rapid Acquisition Gradient Echo ŽMPRAGE. sequence consisting of a 180⬚ inversion pulse followed by a Fast Low Angle Shot ŽFLASH. collection Žflip angle 12⬚, TRs 10 ms, TE s 4 ms, TI s 200 ms and relaxation delay time 500 ms, field of view 250 = 250 mm. to give 128 contiguous ‘slices’ of 1.88-mm thickness for volumetric analysis. The images were taken in the coronal plane orthogonal to the Talairach plane in order to observe each structure perpendicular to its long axis. Immediately after each subject’s scan a large plain test object filled with light oil was scanned in exactly the same place in the coil and in the same orientation as the subject. The test object data were used to correct for inhomogeneity in the radiofrequency coil Žsignal fall off at the extremes of the volumes scanned in the vertical axis.. The mean intensity of a 5 = 5 pixel square in the centre slice of the test object data was taken as the optimum intensity. The whole data set was normalised to this value. Dividing the patient data by the normalised data permitted any deviation from the optimum coil response to be corrected. 2.2. Image analysis Image processing was performed by three raters ŽJNK, SSA, HCW. on Sun Microsystems workstations using the software package ‘Analyze’ ŽMayo Foundation, Rochester, MN, USA.. The Analyze software has the main advantage that it is an ‘off the shelf’ integrated software package that per-
33
formed all the functions that we required for our semi-automated quantitative analysis. There was therefore no need to write specific software ‘in house’, which can make comparisons between studies difficult. The main disadvantage is the limitation of a generic approach to our specific study. A review of image analysis terms and techniques used for the study of brain structure is provided by Andreasen et al. Ž1992.. Firstly, the brain was ‘thresholded’. The use of the term here refers to the process of separating CSF from the brain, and is performed by the selection of image intensity values Žor thresholds. which will distinguish brain matter and CSF. This was performed using the automated edge detection device Žthe ‘autotrace’ function in ‘image edit’. within Analyze. The autotrace function performs segmentation by connecting all pixels within a specified threshold range about a selected seed point and then automatically draws a trace around the region Žhere the autotrace was selected ‘on edge’.. In this case a seed is placed within brain matter and the threshold range is adjusted to include all brain matter and exclude CSF. As the threshold values can range from 0 to 255, the maximum threshold value was set to 255 in order to include all high intensity pixels. The minimum threshold value was selected to exclude CSF by scrutinising by eye the interface between CSF and brain at the edge of the walls of the lateral ventricles, and at the sulci. The optimal screen intensity was selected to highlight this interface. Any questionable pixels were included inside the boundary of the autotrace. With the use of an automated process Ž‘image algebra’ within Analyze., all pixels below the threshold minimum were deleted. This function allows the copying of input to output files. The intensity window of the output file is set to the selected threshold range determined above in order that the file only contains those pixels of interest. This threshold value was kept constant throughout analysis. The brain was separated from the skull and meningeal coverings by the combined use of the autotrace and manual tracing. All areas not firmly attached to the cerebrum, such as the pituitary gland, the olfactory bulbs, nerves and blood ves-
34
H.C. Whalley et al. r Psychiatry Research: Neuroimaging 91 (1999) 31᎐44
sels, were excluded. The optic chiasm was only excluded when it did not form the base of the third ventricle. To define its inferior boundary, the brain was separated from the spinal chord at the most inferior position of the cerebellum. With respect to the ventricular system, the choroid plexus was only removed when it was clearly detached from the ventricular wall. Individual neuroanatomical structures were identified and outlined using a combination of manual tracing and automated edge detection techniques. Volumes were calculated by summing all voxels on all brain slices included. With the exception of the pre-frontal lobes, each region of interest was defined independently of its contralateral pair. With the assistance of an MRI atlas ŽDuvernoy, 1991., strict anatomical definition criteria incorporating anatomical landmarks were used; these are summarised in Table 1. Illustrations of the outlining of the subcortical nuclei and amygdalo-hippocampal complex are depicted in Fig. 1a,b. Where relevant, additional details are outlined below. The posterior boundary of the temporal lobe was defined as the last slice in which the fibres of the crus fornicis, travelling dorsally from the hippocampus along the medial border of the lateral ventricles, were distinct from the grey matter of the hippocampus and white matter of the splenium of the corpus callosum Žafter Shenton et al., 1992.. If the crus fornicis was continuous with, or indistinct from, the splenium or hippocampus, the slice was not included. It is also important to note that at the junction between the hippocampus and the pulvinar nucleus the fibre bundles gyrus fasciolaris and fasciolaris cinerea of the posterior hippocampus alter the position of the uppermost point of the grey matter of the temporal lobe. When the pulvinar nucleus is present, the line tracing the superior boundary of the temporal lobe Žthe line following the sylvian point to the uppermost point of the grey matter of the temporal lobe. is inferior to the grey matter of this nucleus; as the pulvinar nucleus disappears, this line moves more superior in order to incorporate the gyrus fasciolaris. The body of the caudate nucleus borders the lateral wall of the lateral ventricle and the tail
borders the temporal horn to fuse with the amygdala. The tail of the caudate is indistinguishable from the amygdalo-hippocampal complex. For this reason, the tail of the caudate was excluded from measures of the caudate nuclei. For the purposes of this study, all thalamic nuclei were grouped together, as MRI resolution makes distinction between the individual thalamic nuclei difficult. The posterior boundary of the thalamic nuclei was defined as the last slice in which the grey matter of the pulvinar nucleus was distinct from the emerging gyrus fasciolaris of the posterior hippocampus with the crus fornicis obscured behind it. The distinction between the amygdala and hippocampus is also difficult due to resolution. For the purposes of study definition, both nuclei were defined together as a complex. Anterior, as the temporal stem appears, the gyri that enclose the amygdaloid nucleus become more evident. The lateral boundary was defined by the medial aspect of the temporal horn of the lateral ventricle, and the white matter of the temporal stem. The medial boundary was defined by the edges of the subiculum bordering the subarachnoid space. Moving posterior, as the pulvinar nucleus disappears, the full extent of the crus fornicis is revealed and the fibre bundles gyrus fasciolaris and fasciolaris cinerea of the posterior hippocampus emerge. The posterior hippocampus was defined as distinct from the pulvinar nucleus when the full extent of the crus fornicis was present. 2.3. Validity of image analysis In order to examine the validity of our techniques for thresholding and manual editing, a test object of known volume was scanned, corrected for coil inhomogeneity, and then volumetrically analysed. The test object ŽFig. 2. was constructed with the same proton density and relaxation times as white matter, grey matter and CSF. The ‘tissue’ compartments were made from a mixture of agarose and terbium chloride, TbCl 2 6H 2 O. Agarose and terbium chloride act almost independently, agarose reducing T2 and terbium chloride reducing T1 , to give values that are relatively independent of temperature ŽRoberts et al., 1992..
H.C. Whalley et al. r Psychiatry Research: Neuroimaging 91 (1999) 31᎐44
35
Table 1 Summary of definition criteria for regions of interest Region
Boundaries Anterior
Posterior
Medial
Pre-frontal lobes
Frontal pole, when distinct from meninges
Slice anterior to the genu of the corpus callosum ŽSuddath et al., 1990.
Inter-hemispheric fissure
Temporal lobes
Temporal pole, when distinct from meninges
Last slice clearly showing the crus fornicis
Line following the lateral fissure to the sylvian point, then the uppermost point of the grey matter of the temporal lobe ŽShenton et al., 1992.
Caudate nuclei
When clearly distinct from surrounding white matter
Last slice of the temporal lobe
Walls of lateral ventricles
Lentiform nuclei
When putamen clearly distinct from surrounding white matter
Last slice when putamen clearly distinct from surrounding white matter
Naturalistic
Thalamic nuclei
When clearly distinct from surrounding white matter and walls of the third ventricle
Slice anterior to the crus fornicis
Walls of the third ventricle or the cistern beyond
Amygdalohippocampal complex
When clearly distinguishable from surrounding white matter. Not before the temporal stem
Last slice of temporal lobe ŽShenton et al., 1992.
Naturalistic
Lateral ventricles
As defined by the autotrace, frontal horns included
As defined by the autotrace, occipital horns included
As defined by the autotrace, temporal horns included
Third ventricle
When chamber is enclosed inferior by optic chiasm, as defined by the autotrace
Last slice of conical shape as distinct from transverse cerebral fissure beyond
As defined by the autotrace
Fourth ventricle
First slice of rhomboid shape as distinct from the cerebral aqueduct, as defined by the autotrace
As defined by the autotrace
As defined by the autotrace, lateral recesses included
36
H.C. Whalley et al. r Psychiatry Research: Neuroimaging 91 (1999) 31᎐44
Fig. 1. Ža. Illustration of manual outlining of regions of interest on an anterior slice of an edited whole brain. Žb. Illustration of regional outlining of a more posterior slice.
H.C. Whalley et al. r Psychiatry Research: Neuroimaging 91 (1999) 31᎐44
The proton density was controlled by adding deuterium oxide. Deuterium oxide, however, also reduces the relaxation time by decreasing intermolecular H᎐H interaction. The ‘white matter’ compartment was made by mixing 1.8% wrv agarose with a solution comprising 40% vrv D 2 O and 8% vrv 40 mM TbCl 2 . The gel was hydrated by heating to 80⬚C for 20 min and then transferred to five tubular polythene bags Žto increase surface area.. Since the gel is over 98% waterrD 2 O, its volume was found by weighing and correcting for the D 2 O density. The ‘grey matter’ compartment was made in the same way with 1.5% agarose, 20% D 2 O and 3% 40 mM TbCl 2 . It was prepared in a single bag. The ‘white matter’ tubes were embedded in this gel as it cooled. CSF at 38⬚C has a longer T1 than saline at room temperature and is hence more T1 weighted. It was mimicked by dissolving 0.9% wrv sodium chloride in 27:73 vrv D 2 OrH 2 O. This was poured into a bag only slightly larger than the grey matter object and sealed closely round this object to enclose it as much as possible in a thin layer of ‘CSF’. The object was used at room temperature Ž18⬚C.. The test object ŽFig. 2. was analysed twice by two raters ŽJNK and HCW.. The ‘CSFrgrey matter’ interface was thresholded, and the ‘grey matter’ boundary was manually traced where necessary. This was to simulate the thesholding and brain editing performed during the analysis of the subject data. The ‘white matter’ like objects within ‘grey matter’ of the test object were then manually traced to simulate manual editing of regions of interest. Volumes were calculated by summing voxels on all slices included.
Fig. 2. Diagram of test object.
37
2.4. Reliability of image analysis Inter-rater reliability was performed on five randomly selected brains taken from the Edinburgh High Risk Project, by three raters ŽHCW, JNK, SSA., for volumetric measurements of all regions described in Table 1. Intra-rater reliability was performed by HCW at intervals of 6 months on the same five brains taken from the Edinburgh High Risk Project for all regions of interest. Inter- and intra-rater reliability of our methods was assessed using Pearson correlation coefficients, the intraclass correlation coefficient ŽShrout and Fleiss, 1979., and a method based on that described by Bland and Altman Ž1986.. Bland and Altman analysis examines the mean difference between a pair of raters Žequivalent to bias between the raters . compared to the mean volume of the two raters. 2.5. Positional ¨ ariation Tilt can occur in three planes: transverse Ž yaw), coronal (roll), and sagittal (pitch). Our images were acquired perpendicular to the plane of the anterior to posterior commissure, which means that the images are unlikely to be significantly affected by pitch. As the slices are coronal, roll will have no effect on the volumetric measurements made. Our main concern is therefore yaw. This is of most relevance when posterior boundaries of bilateral regions are defined together, for example, the pre-frontal lobes, or when medial boundaries are used to define lateral structures, for example, the crus fornicis used to define the posterior boundary of the temporal lobes. Under the effects of yaw, the lateral structures will move through a greater distance than the medial structures. Application of tilt correction to the images was found to lead to image degradation and for this reason images were not corrected for tilt. To determine the extent of yaw in the Edinburgh High Risk Project, and if there were any significant differences between the groups, the angles of correction were measured. The coronal images were reformatted into the transverse plane using
38
H.C. Whalley et al. r Psychiatry Research: Neuroimaging 91 (1999) 31᎐44
Analyze. The slice showing the most superior part of the anterior commissure was selected, and the angle of correction needed was assessed by tilting this slice through increments of one degree. The angle of correction needed between the three study groups for the Edinburgh High Risk Project was compared using a one-way ANOVA. 2.6. Thresholding longitudinally: the ‘rescale’ method Daily checks were made on the MRI scanner to ensure that the signal-to-noise ratio and geometry complied with the manufacturer’s tolerance. Technical adjustments were made during monthly servicing checks to correct any drifts in the electronics. These checks are, however, not accurate to the sub-pixel level required for volumetric analysis. This variation in performance of the scanner produces a variation in image intensity between scans that is especially pertinent in long-term follow-up studies. Thus, the original threshold chosen for the analysis of the first scan cannot automatically be re-applied to the second scan, raising the issue of how best to select the threshold for the second scan. We evaluated two techniques of thresholding the follow-up scan to determine whether a more automated method of thresholding would be an advantage over judging the threshold by eye. The threshold was judged, firstly by eye according to our original thresholding criteria, and secondly by using a ‘re-scale’ method. The re-scale method was devised in order to compensate for the differences in intensity between the two scans. It is based on the principle of correcting the intensity of the second scan to that of the first, in order that the original threshold chosen for the first scan can be reapplied to the second scan. The first slice showing the genu of the corpus callosum was selected for both the first and second scans. On this slice of both scans, an edgepreserving filter Ž‘VSF mean’ in Analyze. was used. This filter averages pixels within a circle of the target pixel, ignoring all pixels that differ in intensity from the target pixel by more than a step value. The radius of the circle was set to 4 and the step value to 10. The mean signal inten-
sity of this region was then determined in both scans. The mean signal intensity of the first scan was divided by the mean signal intensity of the second scan to produce the ‘re-scaling factor’. A new image file was constructed by multiplying the second scan by this re-scaling factor using the ‘image algebra’ function within Analyze. This function allows algebraic manipulations of scans; in this case the second scan was simply multiplied by the re-scaling factor to produce the re-scaled image. The original threshold chosen for the first scan was then reapplied to this re-scaled image file. Six volunteer normal control subjects Žaged 25᎐35. were scanned after each monthly servicing of the scanner over a 6-month period. The volumes of whole brain were thresholded using both methods and edited by HCW for scans at zero, 3 and 6 months. Using these whole brain edits, the volumes of the lateral, third and fourth ventricles, amygdalo-hippocampal complex, and thalamic nuclei volumes were then analysed by rater AK. These measurements were performed over a 2-month period. Inter-rater reliability for AK was assessed in the manner previously described. To compare the two methods, the agreement between the baseline results and each of the methods at 3 and 6 months was performed using intraclass correlation coefficients. 3. Results 3.1. Validation of image analysis techniques The actual volumes and the range of volumes of the test object’s tissue compartments measured by the two raters are shown in Table 2. These mean absolute differences are very close to the known volumes, both when expressed in cm3 and as a percentage of the actual volume Ž0.5᎐1.4% error.. 3.2. Reliability of image analysis techniques The results derived from the three methods for assessing reliability are shown in Tables 3 and 4. Table 3 shows the inter-rater reliability for 10
H.C. Whalley et al. r Psychiatry Research: Neuroimaging 91 (1999) 31᎐44
39
Table 2 Volumes of the test object, actual and measured volumes ŽGM s grey matter WMs white matter.; one object measured twice by two raters Region
Test object volume Žcm3 .
Two measured volumes Žcm3 . for each of two raters
Mean absolute difference Žcm3 .
Percentage error Ž%.a
GM q WM WM GM
594 287 307
591, 595, 593, 600 287, 289, 280, 294 304, 306, 313, 306
3 4 4
0.5 1.4 1.3
a
Mean absolute difference as a percentage of the known volume.
Table 3 Inter-rater reliability: overall mean Pearson’s and intraclass correlation coefficients, and mean differences expressed as percentages of the mean volume for three raters for all brain regions in five brains Žcorrelation coefficients for bilateral regions are averaged. Region
Whole brain Pre-frontal lobes Temporal lobes Lateral ventricles Third ventricle Fourth ventricle Caudate nuclei Lentiform nuclei Thalamic nuclei Amygdalohippocampal complex
r Ž n s 5.
0.99 0.99 0.99 0.99 0.95 0.93 0.97 0.78 0.93 0.90
ICC Ž n s 5.
0.99 0.98 0.98 0.99 0.93 0.92 0.96 0.75 0.84 0.82
Overall mean difference between raters as % of mean volume Ž n s 5. 0.7 0.8 1.3 2.2 4.4 5.4 2.9 6.9 5.3 9.4
Pearson correlation coefficients for rater HCW were generally high, but lowest for the amygdalohippocampal complex. The mean was 0.94 Žrange 0.80᎐1.00.. The mean overall intraclass correlation coefficient was 0.92 Žrange 0.76᎐0.99.. The overall mean of the percentage differences for rater HCW for the main brain regions measured this way indicate that intra-rater reliability is lowest for the ventricular system. 3.3. Positional ¨ ariation Table 5 shows the mean angle of yaw correction for the three subject groups of the Edinburgh High Risk Project. No significant differences in mean angle of yaw correction were found between the three groups Ž F s 1.1, Ps 0.3., although the patients with schizophrenia had a tendency to turn to the left, whereas the subjects without illness tended to face marginally to the right. 3.4. Thresholding longitudinally
regions Žwhole brain, third ventricle, fourth ventricle and seven bilateral structures . on the five brains measured. The mean overall Pearson correlation coefficient for all brain regions is 0.94, Žrange 0.78᎐0.99., and the mean overall intraclass correlation is 0.92 Žrange 0.75᎐0.99.. The overall mean of the percentage differences between three raters for all regions indicates that smaller regions have generally poorer reliability than larger regions. Table 4 shows that the intra-rater reliability
The average Pearson correlation coefficient for rater AK as compared to the other three raters was 0.87 Žrange 0.59᎐0.99.. The measurement error for AK was 3.4% for the ventricular system, 26.6% for the amygdalo-hippocampal complex and 12.6% for the thalamus. Table 6 presents the mean differences in volume for regions of interest for the two longitudinal thresholding methods, and the agreement between baseline measures and each method at 3 and 6 months as measured by the intraclass correlation coefficient ŽICC.. The mean overall ICC
H.C. Whalley et al. r Psychiatry Research: Neuroimaging 91 (1999) 31᎐44
40
Table 4 Intra-rater reliability: Overall mean Pearson’s and intraclass correlation coefficients, and mean differences expressed as percentages of the mean volume for rater HCW for all brain regions in five brains Žcorrelation coefficients for bilateral regions are averaged. a Region
r Ž n s 5.
ICC Ž n s 5.
Overall mean difference between ratings as % of mean volume Ž n s 5.
Whole brain Pre-frontal lobes Temporal lobes Lateral ventricles Third ventricle Fourth ventricle Caudate nuclei Lentiform nuclei Thalamic nuclei Amygdalohippocampal complex
0.99 0.88 0.93 0.99 0.99 0.97 0.99 0.78 0.87 0.78
0.99 0.83 0.93 0.99 0.97 0.98 0.99 0.76 0.88 0.81
0.6 4.1 0.9 5.5 23.3 4.6 1.7 5.1 4.5 3.2
a
Measurements performed three times at 6-month in-
tervals.
for judging the threshold by eye at 3 months was 0.92 Ž0.77᎐0.99., and 0.93 Ž0.82᎐0.99. at 6 months. For the rescale method, the mean overall ICC at 3 months was 0.90 Ž0.74᎐0.99. and 0.85 Ž0.51᎐0.99. for 6 months. In general for both methods the actual difference over time between the scans, as expected in normal individuals, is small. These results indicate that there is little difference between the two methods of thresholding followup scans.
4. Discussion Measurement error in volumetric MRI studies, as elsewhere, consists of the error inherent in the technique itself Žthe limitations of the technology. and additional errors introduced by the particular methods adopted by experimenters. We have addressed several of the latter in describing our approach to a particular study, seeking to op-
Table 5 Mean angle of yaw correction for three groups in Edinburgh High Risk Project Group
Mean angle of correction
S.D.
Range
n
Controls High risk First episode
0.4 0.2 y0.4
1.7 1.7 1.8
Žy2 to q4. Žy3 to q4. Žy3 to q3.
30 100 20
timise our methods and to generate data of general relevance. We have demonstrated that our techniques are valid and reliable for larger brain structures. Our methods are also acceptably reliable for smaller structures and we show that different ways of measuring reliability can materially affect the figures obtained. We have also established that our preferred method of thresholding is stable over time, comparing favourably well to an automated technique. Overall, our results suggest that in experienced reliable raters, using standardised techniques, error can be kept within reasonable limits. Using a test object of known volume, we have demonstrated that volumes measured using our techniques of thresholding and manual tracing correspond well to actual values. This demonstrates good validity for our techniques in this relatively large test object. Previous studies using segmentation procedures to address the validation of volumetric measurements have found that differences between measured and actual volumes are dependant on the size and shape of the object concerned ŽKohn et al., 1991; Arndt et al., 1994.. Future test object studies should examine objects of more realistic shape and, in particular, of a similar size to some of the regions implicated in neuropsychiatric disorders. Our assessments of both inter- and intra-rater reliability indicate good reproducibility and repeatability for the combined procedures of thresholding, selection of anatomical landmarks and semi-automated manual editing which constitute our volumetric method. Due to the time consuming nature of this method, however, this assessment has been performed on a small num-
H.C. Whalley et al. r Psychiatry Research: Neuroimaging 91 (1999) 31᎐44
41
Table 6 Comparison of the mean differences ŽS.D.. in volume and agreement ŽICC. for regions of interest for the two follow-up thresholding methodsa Region
Mean difference ŽS.D.. and agreement ŽICC. between baseline and follow-up method at 3 months Ž n s 6.
Mean difference ŽS.D.. and agreement ŽICC. between baseline and follow-up method at 6 months Ž n s 4.
Eye
Eye
Rescale
Rescale
y4.4 Ž11.5. 0.99
2.2 Ž5.8. 0.99
2.1 Ž9.1. 0.99
y8.8 Ž10.4. 0.99
y26.5 Ž469.7. 0.98
y144.8 Ž353.2. 0.92
y81.8 Ž258.6. 0.99
251.7 Ž434.0. 0.79
y69.7 Ž418.6. 0.99
y192.5 Ž311.1. 0.82
y867.1 Ž1522.2. 0.99
179.3 Ž340.6. 0.51
y31.6 Ž115.7. 0.90
y59.6 Ž92.8. 0.99
y59.5 Ž25.5. 0.94
0.9 Ž76.2. 0.92
16.4 Ž79.7. 0.98
y11.6 Ž46.5. 0.98
y28.2 Ž46.0. 0.99
14.3 Ž6.7. 0.99
y38.5 Ž550.2. 0.77
219.6 Ž363.5. 0.74
y167.6 Ž1008.5. 0.90
y227.5 Ž818.5. 0.84
y121.0 Ž463.4. 0.86
y46.8 Ž376.0. 0.92
81.4 Ž493.3. 0.90
368.8 Ž429.5. 0.92
Amygdalo-hippocampal. complex, left ICC
y39.3 Ž285.9. 0.89
y167.2 Ž369.6. 0.88
71.1 Ž436.5. 0.86
118.5 Ž374.1. 0.92
Amygdalo-hippocampal. complex, right ICC
y288.8 Ž371.6. 0.89
y74.2 Ž515.4. 0.87
275.8 Ž457.0. 0.82
461.3 Ž604.9. 0.79
0.92
0.93
0.90
0.85
Whole brain ICC Lateral ventricle, left ICC Lateral ventricle, right ICC Third ventricle ICC Fourth ventricle ICC Thalamic nucleus, left ICC Thalamic nucleus, right ICC
Overall mean ICC a
Volumes in mm , except whole brain Žcm3 .. 3
ber of brains. This is a limitation of these reliability estimates, since such a small sample size would produce relatively wide confidence intervals. Analysis of the individual subcortical regions
and the amygdalo-hippocampal complex demonstrates that smaller brain regions are inherently more prone to rater measurement error than larger regions, as other studies have found
42
H.C. Whalley et al. r Psychiatry Research: Neuroimaging 91 (1999) 31᎐44
ŽZipursky et al., 1994; Flaum et al., 1995; Becker et al., 1996.. The region most prone to rater variability over time, as assessed by the percentage differences, appears to be the ventricular system, particularly the third ventricle. Repeated assessments of reliability are necessary to monitor the stability of the methods used over time in any follow-up study, and in any study such as ours that takes a long time to acquire case-control data. Our main rater ŽHCW. re-checks her reliability every 6 months which facilitates consistency of technique. Correlation coefficients are a common way of measuring reliability, but they can be misleading as they assess association between measurements rather than agreement ŽBland and Altman, 1986.. Intra-class correlation coefficients are a better estimation of agreement, but a variety of different coefficient values can be produced depending on the precise test used ŽMuller and Buttner, 1994.. We regard a Bland and Altman approach as a more valid and informative assessment of reliability than the sole use of correlation coefficients. For example, the correlation coefficients produced for intra-rater reliability suggest good agreement over time for the region of the third ventricle, but the percentage differences suggest otherwise. The inherent difficulty with these measures, however, is that an arbitrary decision must be made about what amount of measurement error is acceptable. The differences between the three groups in the Edinburgh High Risk study in the angle of yaw correction required were minimal and not statistically significant. Improved methods for tilt correction which do not lead to image degradation have been shown to improve scan-rescan reliability over time in normal volunteers ŽBartzokis et al., 1998.. These workers, however, reported reduced variance over time which is desirable but not necessarily valid given the influence of several factors Že.g. age or hydration. on brain structure. Possible positioning errors are of particular importance in the assessment of the putative abnormal brain asymmetry in schizophrenia ŽZipursky et al., 1990.. For example, Bilder et al. Ž1994 and personal communication. reported that their first episode patients
with schizophrenia were more likely than control subjects to turn to the door Žon the right. through which the experimenter had just left the MRI scanner. We report a similar tendency in our patients, but to a door on the left ᎏ which can not therefore be attributed to any potential medication-related turning bias. We have described an alternative technique for thresholding scans taken in the same individual over time. In a longitudinal study it is important to be aware of the aspecific factors which may effect variation of cerebral morphology between scans and sections. These could include factors such as normal ageing, hydration, hormonal status, and nutrition. Assuming that the brains of our relatively young and healthy subjects in this part of our investigation did not undergo any real change in volume during the 6 months of this study, we have shown that the results from our established method compare well with results from a more automated technique. Either method appears to be suitable for thresholding in our follow-up investigations. It should be noted, however, that this comparison of methods is based on a relatively small sample size. In a long-term study like the Edinburgh High Risk study, it is important to preserve a consistent methodological approach. While impressive automated techniques for cross-sectional ŽAndreasen et al., 1994; Wright et al., 1995. and longitudinal studies ŽFox et al., 1996. are being developed, our methods were developed for the start of the study in 1994. They represent a compromise between the limitations of subjectivity Žwhich demands extensive training in the use of a semi-automated method. and the advantages of proven validity and reliability.
Acknowledgements This study was supported by an MRC programme grant. The authours would like to thank other members of the Edinburgh High Risk Project, Elizabeth Grant, Ann Hodges, and Majella Byrne, who acquired the sample, and the radiographers, who performed the MRI scanning.
H.C. Whalley et al. r Psychiatry Research: Neuroimaging 91 (1999) 31᎐44
References Andreasen, N.C., Cohen, G., Harris, G., Cizadlo, T., Parkkinen, J., Rezai, K., Swayze, V.W., 1992. Image processing for the study of brain structure and function: problems and programs. Journal of Neuropsychiatry 4, 125᎐133. Andreasen, N.C., Arndt, S., Swayze, V., Cizadlo, T., Flaum, M., O’Leary, D., Ehrhardt, J.C., Yuh, W.T.C., 1994. Thalamic abnormalities in schizophrenia visualised through magnetic resonance image averaging. Nature 266, 294᎐297. Arndt, S., Swayze, V., Cizadlo, T., O’Leary, D., Cohen, G., Yuh, W.T.C., Ehrhardt, J.C., Andreasen, N.C., 1994. Evaluating and validating two methods for estimating brain structure volumes: tessellation and simple pixel counting. Neuroimage 1, 191᎐198. Bartzokis, G., Altshuler, L.L., Greider, T., Curran, J., Keen, B., Dickson, W.J., 1998. Reliability of medial temporal lobe measurements using reformatted 3D images. Psychiatry Research: Neuroimaging 82, 11᎐24. Becker, T., Elmer, K., Schneider, F., Schneider, M., Grodd, W., Bartels, M., Heckers, S., Beckmann, H., 1996. Confirmation of reduced temporal limbic structure volume on magnetic resonance images in male patients with schizophrenia. Psychiatry Research: Neuroimaging 67, 135᎐143. Bilder, R.M., Wu, H., Bogerts, B., Degreef, G., Ashtari, M., Jose, M., Alvir, J., Snyder, P.J., Lieberman, J.A., 1994. Absence of regional hemispheric volume asymmetries in first-episode schizophrenia. American Journal of Psychiatry 151, 1437᎐1447. Bland, J.M., Altman, D.G., 1986. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1 Ž8476., 307᎐310. Bogerts, B., Ashtari, M., Degreef, G., Jose, M., Alvir, J., Bilder, R.M., Lieberman, J.A., 1990. Reduced temporal limbic structure volumes on magnetic resonance images in first episode schizophrenia. Psychiatry Research: Neuroimaging 35, 1᎐13. Cannon, T.D., Mednick, S.A., Parnas, J., Schulsinger, F., Praestholm, J., Vestergaard, A., 1993. Developmental brain abnormalities in the offspring of schizophrenic mothers. I: contributions of genetic and perinatal factors. Archives of General Psychiatry 50, 551᎐564. Daniel, D.G., Goldberg, T.E., Gibbons, R.D., Weinberger, D.R., 1991. Lack of a bimodal distribution of ventricular size in schizophrenia: a Gaussian mixture analysis of 1056 cases and controls. Biological Psychiatry 30, 887᎐903. Davis, K.L., Buchsbaum, M.S., Shihabuddin, L., Spiegel-Cohen, J., Metzger, M., Frecska, E., Keefe, R.S., Powchik, P., 1998. Ventricular enlargement in poor-outcome schizophrenia. Biological Psychiatry 43, 783᎐793. DeLisi, L.E., Sakuma, M., Tew, W., Kushner, M., Hoff, A.L., Grimson, R., 1997. Schizophrenia as a chronic active brain process: a study of progressive brain structural change subsequent to the onset of schizophrenia. Psychiatry Research 74 Ž3., 129᎐140.
43
Duvernoy, H., 1991. The Human Brain: Surface, Three-dimensional Sectional Anatomy and MRI. Springer-Verlag, Vienna. Flaum, M., Swayze, V.W., O’Leary, D.S., Yuh, W.T.C., Ehrhardt, J.C., Arndt, S.V., Andreasen, N.C., 1995. Effects of diagnosis, laterality and gender on brain morphology in schizophrenia. American Journal of Psychiatry 152, 704᎐714. Fox, N.C., Freeborough, P.A., Rossor, M.N., 1996. Visualisation and quantification of rates of atrophy in Alzheimer’s disease. Lancet 348, 94᎐97. Gur, R.E., Cowell, P., Turetsky, B.I., Gallacher, F., Cannon, T., Bilker, W., Gur, R.C., 1998. A follow-up magnetic resonance imaging study of schizophrenia. Archives of General Psychiatry 55, 145᎐152. Illowsky, B.P., Juliano, D.M., Bigelow, L.B., Weinberger, D.R., 1988. Stability of CT scan findings in schizophrenia: results of an 8 year follow-up study. Journal of Neurology, Neurosurgery and Psychiatry 51, 209᎐213. Jaskiw, G.E., Juliano, D.M., Goldberg, T.E., Hertzman, M., Urow-Hamell, E., Weinberger, D.R., 1994. Cerebral ventricular enlargement in schizophreniform disorder does not progress. A seven year follow-up study. Schizophrenia Research 14, 23᎐28. Kemali, D., Mag, M., Galderisi, S., Milici, N., Salvati, A., 1989. Ventricle-to-brain ratio in schizophrenia: a controlled follow-up study. Biological Psychiatry 26, 753᎐756. Kohn, M.I., Tanna, N.K., Herman, G.T., Resnick, S.M., Mozeley, P.D., Gur, R.E., Alavi, A., Zimmerman, R.A., Gur, R.C., 1991. Analysis of brain and cerebrospinal fluid volumes with MR imaging. Radiology 178, 115᎐122. Lawrie, S.M., Abukmeil, S.S., 1998. Brain abnormality in schizophrenia. A systematic and quantitative review of volumetric magnetic resonance imaging studies. British Journal of Psychiatry 172, 110᎐120. Lawrie, S.M., Whalley, H., Kestelman, J.N., Abukmeil, S.S., Byrne, M., Hodges, A., Rimmington, J.E., Best, J.J.K., Owens, D.C.G., Johnstone, E.C., 1999. Magnetic resonance imaging of the brain in subjects at high risk of developing schizophrenia. Lancet 353 Ž9146., 30᎐33. Lewis, S.W., 1990. Computerised tomography in schizophrenia: 15 years on. British Journal of Psychiatry 157 ŽSuppl. 9., 16᎐24. Lim, K.O., Tew, W., Kushner, M., Chow, K., Matsumoto, B., DeLisi, L.E., 1996. Cortical grey matter volume deficit in patients with first-episode schizophrenia. American Journal of Psychiatry 153, 1548᎐1553. Muller, R., Buttner, P., 1994. A critical discussion of intraclass correlation coefficients. Statistics in Medicine 13, 2465᎐2476. Nair, T.R., Christensen, J.D., Kingsbury, S.J., Kumar, N.G., Terry, W.M., Garver, D.L., 1997. Progression of cerebroventricular enlargement and the subtyping of schizophrenia. Psychiatry Research: Neuroimaging 74, 141᎐150. Nasrallah, H.A., Olson, S.C., McCalley-Whitters, M., Chapman, S., Jacobi, C.G., 1986. Cerebral ventricular enlarge-
44
H.C. Whalley et al. r Psychiatry Research: Neuroimaging 91 (1999) 31᎐44
ment in schizophrenia: a preliminary follow-up study. Archives of General Psychiatry 43, 157᎐159. Nelson, M.D., Saykin, A.J., Flashman, L.A., Riordan, H.J., 1998. Hippocampal volume reduction in schizophrenia as assessed by magnetic resonance imaging. Archives of General Psychiatry 55, 433᎐440. Nopoulos, P., Torres, I., Flaum, M., Andreasen, N.C., Ehrhardt, J.C., Yuh, W.T.C., 1995. Brain morphology in first-episode schizophrenia. American Journal of Psychiatry 152, 1721᎐1723. Rapoport, J.L., Giedd, J., Kumra, S., Jacobsen, L., Smith, A., Lee, P., Nelson, J., Hamburger, S., 1997. Childhood-onset schizophrenia: progressive ventricular change during adolescence. Archives of General Psychiatry 54, 897᎐903. Raz, S., Raz, N., 1990. Structural brain abnormalities in the major psychoses: a quantitative review of the evidence from computerised imaging. Psychological Bulletin 108 Ž1., 93᎐108. Roberts, N., Rimmington, J.E., Foster, M.A., 1992. A new role for tripositive lanthanide ions in test objects designed for quality control checks on magnetic resonance imaging systems. Physics in Medicine and Biology 37 Ž10., 1977᎐1984. Seidman, L.J., Faraone, S.V., Goldstein, J.M., Goodman, J.M., Kremen, W.S., Matsuda, G., Hoge, E.A., Kennedy, D., Makris, N., Caviness, V.S., Tsuang, M.T., 1997. Reduced subcortical brain volumes in nonpsychotic siblings of schizophrenic patients: a pilot magnetic resonance imaging study. American Journal of Medical Genetics 74 Ž5., 507᎐514. Shenton, M.E., Kikinis, R., Jolesz, F.A., Pollack, S.T., LeMay, M., Wible, C.G., Hokama, H., Martin, J., Metcalf, D., Coleman, M., McCarley, R.W., 1992. Abnormalities of the left temporal lobe and thought disorder in schizophrenia: a quantitative magnetic resonance imaging study. New England Journal of Medicine 327, 604᎐612. Shrout, P.E., Feliss, J.L., 1979. Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin 86 Ž2., 420᎐428. Suddath, R.L., Christison, G.W., Torrey, E.F., Casanova, M.F., Weinberger, D.R., 1990. Anatomical abnormalities in the
brains of monozygotic twins discordant for schizophrenia. New England Journal of Medicine 322 Ž12., 789᎐794. Van Horn, J.D., McManus, I.C., 1992. Ventricular enlargement in schizophrenia. A meta-analysis of studies of the ventricle:brain ratio ŽVBR.. British Journal of Psychiatry 160, 687᎐697. Vita, A., Sacchetti, E., Valvassori, G., Cazullo, C.L., 1988. Brain morphology in schizophrenia: a 2- to 5-year CT scan follow-up study. Acta Psychiatrica Scandinavica 78, 618᎐621. Vita, A., Dieci, M., Giobbio, G.M., Garbarini, M., Morganti, C., Braga, M., Invernizzi, G., 1994. A reconsideration of the relationship between cerebral structural abnormalities and family history of schizophrenia. Psychiatry Research 53 Ž1., 41᎐55. Ward, K.E., Friedman, L., Wise, A., Schultz, S.C., 1996. Meta-analysis of brain and cranial size in schizophrenia. Schizophrenia Research 22, 197᎐213. Weinberger, D.R., DeLisi, L.E., Neophytides, A.N., Wyatt, R.J., 1981. Familial aspects of CT scan abnormalities in chronic schizophrenic patients. Psychiatry Research 4, 65᎐71. Wright, I.C., McGuire, P.K., Poline, J.B., Travere, J.M., Murray, R.M., Frith, C.D., Frackowiak, R.S.J., Friston, K.J., 1995. A voxel-based method for the statistical analysis of grey and white matter density applied to schizophrenia. Neuroimage 2 Ž4., 244᎐252. Woods, B.T., Yurgelun-Todd, D., Benes, F.M., Frankenberg, F.R., Hope, H.G., McSparren, J., 1990. Progressive ventricular enlargement in schizophrenia: comparison to bipolar affective disorder and correlation with clinical course. Biological Psychiatry 27, 341᎐352. Zipursky, R.D., Lim, K.O., Pfefferbaum, A., 1990. Volumetric assessment of cerebral asymmetry from CT scans. Psychiatry Research: Neuroimaging 35, 71᎐89. Zipursky, R.B., Marsh, L., Lim, K.O., DeMont, S., Shear, P.K., Sullivan, E.V., Murphy, G.M., Csernansky, J.G., Pfefferbaum, A., 1994. Volumetric MRI assessment of temporal lobe structures in schizophrenia. Biological Psychiatry 35, 501᎐506.