Whole-brain atrophy in multiple sclerosis measured by two segmentation processes from various MRI sequences

Whole-brain atrophy in multiple sclerosis measured by two segmentation processes from various MRI sequences

Journal of the Neurological Sciences 216 (2003) 169 – 177 www.elsevier.com/locate/jns Whole-brain atrophy in multiple sclerosis measured by two segme...

222KB Sizes 0 Downloads 44 Views

Journal of the Neurological Sciences 216 (2003) 169 – 177 www.elsevier.com/locate/jns

Whole-brain atrophy in multiple sclerosis measured by two segmentation processes from various MRI sequences M.A. Horsfield a,*, M. Rovaris b, M.A. Rocca b, P. Rossi b, R.H.B. Benedict c, M. Filippi b, R. Bakshi c,d a

Division of Medical Physics, University of Leicester, Leicester Royal Infirmary, Leicester LE1 5WW, UK b Neuroimaging Research Unit, University Hospital San Raffaele, via Olgettina 60, Milan 20132, Italy c Department of Neurology, University at Buffalo, State University of New York, 100 High Street, Buffalo, NY 14203, USA d Buffalo Neuroimaging Analysis Center, The Jacobs Neurological Institute, 100 High Street, Buffalo, NY 14203, USA Received 8 April 2003; received in revised form 27 June 2003; accepted 31 July 2003

Abstract Recent MRI and pathologic studies have drawn attention to the destructive nature of the multiple sclerosis (MS) disease process, including the early occurrence of axonal and neuronal loss, leading to macroscopic brain and spinal cord atrophy. Measurement of brain atrophy from MRI has emerged as a potential outcome measure and marker of disease severity in MS and neurodegenerative diseases such as Alzheimer’s. However, the optimal method for quantifying atrophy has not been established, including the choice of pulse sequence and segmentation algorithm employed. Using two different MRI scanners to ensure generalizability of results, we compared the reproducibility of four pulse sequences and two analysis methods (fully automated [FA] and semi-automated [SA]) when obtaining brain parenchymal fraction (BPF), a normalized measure of whole-brain atrophy, in patients with MS (n = 13) and normal controls (n = 2). In order to ensure the validity of our fully automated analysis technique, we also used it to evaluate the atrophy rate over nine months in 57 MS patients from the placebo arm of a clinical trial. All pulse sequences were capable of yielding reproducibility of around 1% coefficient of variation (CoV) or better. The best reproducibility was obtained using 2D multi-slice sequences (conventional spin echo [SE] and fluid-attenuated inversion recovery [FLAIR]), with fully automated analysis. Fully automated analysis of the longitudinal data (conventional spin echo) showed an atrophy rate of  0.5% change in BPF per year, in line with previous findings from a similar cohort of patients. In conclusion, BPF measurement is affected by both pulse sequence and segmentation method. Automated measurement has high reproducibility especially when 2D sequences are used. Semi-automated measurement may have increased accuracy, but with a decreased efficiency and reliability. D 2003 Elsevier B.V. All rights reserved. Keywords: Cerebral; Atrophy; MRI; Pulse sequence; Multiple sclerosis; Dementia; Reproducibility; Automated

1. Introduction Cerebral atrophy (CA) occurs at a rate between 0.1% and 0.3% per year as part of the normal ageing process beyond the fourth decade, but is accelerated in several neurological conditions such as multiple sclerosis (MS) and Alzheimer’s disease [1]. The loss of tissue within the brain and spinal cord is thought to result from myelin damage and axonal

* Corresponding author. Tel.: +44-116-2585080; fax: +44-1162585979. E-mail address: [email protected] (M.A. Horsfield). 0022-510X/$ - see front matter D 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.jns.2003.07.003

loss, followed by Wallerian degeneration and the loss of extracellular space and vascular compartments. Since CA can be measured serially on MRI scans of the brain, it has been proposed as a means of monitoring the progression of MS [2] and Alzheimer’s disease [3]. Numerous MRI studies [4 – 15] have investigated the magnitude of the correlation between brain volume decrease and clinical findings in MS, both cross-sectionally and longitudinally. CA can develop in the early, relapsing – remitting (RR) phase of MS [4,5,8,13,15]. The degree of tissue loss is, however, greater in patients with more disabling, chronic progressive disease courses [6,7,8,13]. Where putative disease-modifying treatments for MS are

170

M.A. Horsfield et al. / Journal of the Neurological Sciences 216 (2003) 169–177

available, conventional MRI measures of lesion load have shown limited sensitivity and specificity for monitoring treatment effects. CA assessed from serial brain MRI scans is, therefore, being studied as an additional outcome measure in clinical trials of these treatments. Atrophy is usually measured from T1-weighted MRI scans, since this gives good contrast between the CSF and the brain parenchyma. However, several MRI pulse sequence types have been proposed or used to measure atrophy, including conventional T1-weighted spin echo (SE) [7,14,15], fluid-attenuated inversion recovery (FLAIR) [16], 3D gradient echo (3D) [6,11] and a method that uses double-echo proton-density/T2-weighted images in a subtraction technique that produces an image with T1-like contrast [17]. The FLAIR sequence has potential benefits since the cerebrospinal fluid (CSF) is greatly suppressed, resulting in a very clear definition of the brain parenchyma. On the other hand, the amount of intracranial CSF is an important indicator of the degree of atrophy, so it may be beneficial to retain some signal from CSF. Whatever the pulse sequence employed, the resulting image must have good contrast between the parenchyma and CSF. CA can be assessed as a change in the absolute volume of brain parenchyma, or as a change in a normalized index of brain volume such as the brain parenchymal fraction (BPF) [18]. The advantages of the BPF as a measure of atrophy are twofold. First, an assessment of the degree of atrophy of the brain can be obtained in cross-sectional studies by a single measurement, since the normalization procedure takes account of absolute brain size. Secondly, any variation in the calibration of the MRI scanner gradient strengths should have little effect on the BPF; variations in magnetic field gradient strength have confounded previous atrophy studies and should be corrected by normalizing to some other constant dimension, such as those of the skull [19]. The measurement of atrophy can be time consuming when the technique is applied to studies involving many patients. Ideally, any technique should be as automated as possible, so as to minimize the possibility of introducing operator-dependent errors in the measure. A fully automated (FA) technique would remove this source of error altogether and allow scans to be evaluated quickly and inexpensively. However, it is also important to assess the reproducibility of the whole of the data collection and image analysis method, since the pulse sequence used and the analysis procedure are often strongly coupled. This study examines the scan – rescan reproducibility achievable using different T1-weighted pulse sequences with both a fully automated, and a semiautomated (SA) method of analysis. Since this work was commenced, a similar study has recently been published [20]; however, the current work extends this by comparing scan –rescan reproducibility, fully automated vs. semi-automated processing, and a wide range of pulse sequences acquired on two MRI scanners. This latter aspect is especially interesting, since in the context of multi-center clinical

trials, it is important to show that finding of reproducibility will generalize to different types of scanner. In order to test the validity and general applicability of the fully automated method, we also applied this technique to a larger cohort of patients whose MRI scans were collected longitudinally as part of a placebo-controlled clinical trial of a diseasemodifying agent for MS. The data were collected at baseline and 9 months at 18 centers, using eight different models of MRI scanner and a conventional T1-weighted spin-echo sequence.

2. Methods 2.1. MRI scanning A total of 15 subjects participated into the main part of the study, which was carried out with the approval of the local ethical committees; written informed consent was obtained from all subjects. In order to ensure generalizability of the results, scanning was carried out at two centers: either Neuroimaging Research Unit, University Hospital S. Raffaele (Milan, Italy) (nine subjects, Siemens Vision 1.5T) or Department of Neurology, University at Buffalo, State University of New York (USA) (six subjects, General Electric Signa 4x/Lx 1.5T). The study population consisted of 2 normal controls and 13 MS patients; MS patients had a range of disease courses (one primary progressive, three relapsing – remitting and nine secondary progressive), and their EDSS scores were between 2.0 and 8.0. Each subject was scanned with four pulse sequences during a single session: 2DSE (TR 600 ms, TE 10 –12 ms, two signal averages), FLAIR (TR 7000– 8000, TE 110– 127, TI 2000, echo train length 11, one signal average), 3D FLASH (TR 20– 22, TE 5– 6, a = 18j, one signal average) and a dual-echo fast spin echo (TR 2800, TE 12 – 14, 85– 91, echo train length 5, two signal averages). All 2D sequences were acquired with a 256  192 raw data matrix with an in-plane resolution of 0.98 mm and forty-four 3-mm slices giving whole-brain coverage. The 3D sequence was acquired with isotropic resolution (raw data matrix 256  160  128 superior – inferior  anterior – posterior  left – right) and a voxel volume the same as for the 2D sequences (1.4 mm3) and cubic voxels. The acquisition parameters were chosen so that the voxel volumes and scan times (7 min 43 s) were the same for all sequences so that no unfair advantage would be gained by any of the sequences because of longer acquisition time or reduced partial volume effect. We wished to perform a fair comparison of sequences based only on their intrinsic contrast characteristics, removing all other potentially confounding factors. For the 2D sequences, slices were repositioned using previously published guidelines with the slices parallel to a line joining the most antero-inferior and postero-inferior margins of the corpus callosum. For the 3D sequence, the acquired block was positioned in a pure sagittal plane

M.A. Horsfield et al. / Journal of the Neurological Sciences 216 (2003) 169–177

171

without reference to specific anatomical landmarks. The four scans were performed in a random order to ensure that no bias was introduced by increased patient restlessness and movement through the scanning session. The parameters for the 2DSE are typical of those collected in MS treatment trials and used to assess the degree of Gadolinium contrast agent enhancement; these scans are often used secondarily to assess atrophy [18]. The repetition time and echo train length for the FLAIR sequence was chosen to give the same scan time as the 2DSE, and the inversion time chosen to maximize suppression of the CSF signal. The repetition time and flip angle for the 3D sequence gave maximum contrast between gray matter and CSF, while maintaining the same scan time as the 2D sequences. Finally, the echo times for the dual-echo sequence are again typical of those used in MS trials, for the measurement of lesion volume [21]. The subjects were then removed from the scanner and an interval not less than 2 h and not more than 1 week elapsed before they were rescanned with the same set of sequences so that scan – rescan reproducibility could be assessed. At least one other person was scanned between the two scanning sessions to ensure that the scanners performed the full pre-scan procedure to calibrate the pulse flip angles and receiver gains settings.

weighted and dual-echo scans were collected, but we analyzed only the T1-weighted images. The acquisition parameters varied slightly from center to center, depending on the exact capabilities of the scanner, but were similar to those above used in the reproducibility study [22]. T1weighted scans were collected after the injection of a standard dose of Gadolinium contrast agent, since the primary purpose of the trial was the assessment of enhancing lesion number.

2.2. Clinical trial data

2.4. Automated analysis

In order to demonstrate the validity of our analysis method, we tested the algorithm independently on longitudinal data collected as part of a placebo-controlled clinical trial. Fifty-seven patients were randomly selected from the placebo arm of a double-blind, phase III clinical trial of a pharmaceutical agent for MS (glatiramer acetate, CopaxoneR, Teva Pharmaceutical Industries, Netanya, Israel) [22]. Their MRI scans were collected at eighteen centers using eight different models of MRI scanner from four manufacturers (Siemens, Philips, General Electric and Picker International). Data were collected at baseline (prior to the initiation of placebo treatment) and each month for 9 months; however, we analyzed only the baseline and 9month scans. Both standard multi-slice spin-echo T1-

Fully automated analysis used a procedure in two stages. First is the definition of the outer contour of the brain, thus separating the brain and CSF from other tissue such as the scalp, skull bone marrow, and other tissues of the face. Second is the separation of the brain and CSF into their two classes. The first stage of brain/CSF extraction involves correction of intensity non-uniformities throughout the imaged volume that are caused by variation in the transmission and reception properties of the RF coils (the bias field) [23]. The image ‘‘foreground’’ is first identified and isolated from the background noise using a simple intensity threshold based on the intensity histogram. The bias correction then works using only those pixels that are in the foreground. The

2.3. Image analysis All image analyses were performed using software written in-house (Jim version 2.0), using the Java programming language (Java virtual machine version 1.2.2, Sun Microsystems, Santa Clara, CA) by an experienced observer who was blind to clinical information (MAH). The 3D and dual-echo scans required simple pre-processing steps before assessment of CA. The 3D images were cropped to remove all signal from below the base of the cerebellum, i.e., from the neck and shoulders. For the dualecho images, the late echo was subtracted from the early echo image to produce an image with appearance similar to that of a FLAIR image. This image will be referred to here as the ‘‘subtraction’’ image (Fig. 1).

Fig. 1. The four sequences: SE, FLAIR, 3D FLASH and subtraction image, from left to right, respectively, for a single subject.

172

M.A. Horsfield et al. / Journal of the Neurological Sciences 216 (2003) 169–177

algorithm used, models the bias field as a three-dimensional polynomial surface of order 3 (plus a constant offset), with coefficients in x, y, z, x2, y2, y2, xy, xz, yz, x3, y3, z3, x2y, x2z, xy2, xy2, y2z, xz2, yz2 and xyz. The downhill simplex algorithm is used to adjust the polynomial coefficients so as to minimize the ‘‘entropy’’ of the pixel intensity histogram, where entropy is defined as: Entropy ¼ 

X

pi lnðpi Þ:

ð1Þ

i

where pi is the probability of occurrence of pixel intensity i, and the summation is performed over all intensities. Any bias will blur the histogram and cause its entropy to be increased. Thus, the bias field can be estimated by restoring the sharpness of the histogram, with the entropy being an objective measure of this criterion. The original image is then divided by the estimated bias field to give a uniformitycorrected image. The uniformity-corrected image is then processed using a brain extraction technique based on Ref. [24]. A head center location is estimated from the first moment of the pixel intensities, and an approximate head size is estimated from the distribution of pixels about this center. The brain surface is modeled as a triangulated surface mesh, which is initialized to a tessellated sphere at the

center location, and with a radius that is a fixed fraction (0.6) of the approximate head radius. The vertices of this surface mesh are iteratively updated using criteria that (a) maintain an even spacing of the vertices, (b) maintain a degree of smoothness of the surface of the mesh and (c) cause the mesh to expand and conform to the surface of the brain, where the intensity reduces at the boundary between the brain and skull. As implemented by us, the algorithm has a single adjustable parameter (called bt in Ref. [24], and with a range between 0 and 1), which is a fractional intensity difference between the brain and background. For the FLAIR images (with very dark CSF), a setting 0.3 was found to give reliable extraction of the brain, while for all other images, bt was set to its default value of 0.5. Having extracted that brain and CSF space, the method then calculates the BPF, based on the method described by Fisher et al. [12]. First, a further bias field correction is applied, but this time using only pixels from the brain and CSF compartments; this helps to refine the correction, since by this stage, the intensity histogram is dominated by two peaks: one from the CSF and a second from the brain parenchyma. The location of these two peaks is next estimated by non-linear least-squares regression that fits two normal distributions to the image intensity histogram. An intensity threshold to separate brain parenchyma from

Fig. 2. Flow chart showing the stages in fully automated and semi-automated analysis. Both procedures are identical except that a manual correction to the brain/CSF outline may be made during semi-automated analysis.

M.A. Horsfield et al. / Journal of the Neurological Sciences 216 (2003) 169–177

CSF is then calculated as the mean of the two peak locations, and pixels are classified either as CSF if their intensity is below the threshold, or as brain parenchyma. Finally, the BPF is calculated as: BPF ¼

parenchyma volume : parenchyma volume þ CSF volume

ð2Þ

On T1-weighted images, MS lesions may appear isointense with normal-appearing white matter or hypointense, depending of the degree of pathological damage within the lesion [25]. Although it was a rare occurrence, it was therefore possible for lesions with extreme damage to be classified by the above simple thresholding procedure as CSF. However, this may not be detrimental to the method, since such severely damaged tissue is unlikely to be contributing normally to neurological function, and its effect of lowering the BPF may make the technique more sensitive to clinical deficits.

The semi-automated method uses the same software and proceeds using the same processing steps. However, the user performs a manual correction of the extracted brain and CSF space before calculation of the BPF. The software intersects the brain surface mesh with the image slices to form a set of two-dimensional regions of interest (ROIs) enclosing the brain and CSF. The user can then modify these ROIs by clicking and dragging the outlines using the computer mouse. Both the fully automated and semi-automated techniques are summarized in the flow chart of Fig. 2. Processing time for fully automated analysis is about 15 – 20 min using a 1GHz Pentium PC running Linux. The time required for semiautomated analysis varied from an average of 25 min for the 2DSE sequence to an average of 41 min for the 3D sequence. 2.6. Statistical analysis The scan –rescan reproducibility of the BPF was determined from the scans performed on two separate occasions. The interval between the two scans was short (always less than 1 week), and we assumed that no real change in brain volume had occurred during that interval. For each pair of scans, the coefficient of variation (CoV) was determined as: standard deviation of BPF  100% mean BPF

Table 1 Clinical and demographic characteristics and mean values of brain parenchymal fraction for the 13 MS patients and 2 healthy controls (HC) Patient number

Age/ years

Gender

Disease course

EDSS

Disease duration/ years

Mean (S.D.) BPF

1 2 3 4 5 6 7 8 9 10 11 12 13 HC 1 HC 2

44 34 38 43 42 38 27 60 35 40 37 29 51 21 41

F F F M M F M M F F M F M M M

PP RR RR RR SP SP SP SP SP SP SP SP SP – –

4.0 4.0 2.5 2.5 5.0 8.0 6.0 4.5 5.5 5.0 4.5 2.0 6.0 – –

13 13 7 6 6 6 10 5 2 5 10 10 12 – –

0.7915 0.8522 0.8046 0.8441 0.8294 0.8473 0.8349 0.8506 0.7798 0.8785 0.8341 0.8209 0.7410 0.8931 0.8752

(0.0110) (0.0165) (0.0183) (0.0201) (0.0137) (0.0080) (0.0115) (0.0113) (0.0123) (0.0068) (0.0127) (0.0135) (0.0304) (0.0138) (0.0125)

The BPF values are the mean and standard deviation of all fully automated and semi-automated analyses for all four of the pulse sequences.

2.5. Semi-automated analysis

CoV ¼

173

ð3Þ

For each scanning sequence and analysis method, the mean coefficient of variation over all patients is presented.

3. Results Table 1 shows the demographic and disease characteristic of the 13 MS patients and 2 healthy controls. For the MS

patients, the median age was 38 (range: 27 – 60) years, median disease duration was 7 (range: 2 – 13) years and median Expanded Disability Status Scale (EDSS) was 4.5 (range: 2 – 8). Fig. 1 shows one slice from the most severely atrophied patient for each of the four sequences. After fully automated processing, visual inspection confirmed successful brain segmentation in all except some of the subtraction images. In three of the more severely atrophied patients, the mesh failed to conform to the outer surface of the brain, and, therefore, only the results for semi-automated analysis are presented. It can be seen in Fig. 1 that the subtraction image is blurred as a result of slight misregistration between the early and late echoes. This was particularly noticeable on one of the scanners used and is thought to result from different eddy current distortion of the two echoes; this may have contributed to the poor performance of the brain extraction procedure. In two cases, for the 3D sequence, the brain surface mesh contained bulges from the temporal lobes into the facial muscles, which were of very similar intensity to the brain parenchyma and abutted the temporal lobes. However, these were not considered serious enough for us to exclude the 3D sequences from the fully automated analysis. The subtraction images were not amenable to fully automated processing, since the brain extraction procedure was not reliable. All other sequences are capable of yielding CoVs in the vicinity of 1%. All BPF distributions were near normal by Kolmogorov –Smirnov Z test, indicating the appropriate use of parametric statistics. General linear models were employed to explore group mean comparisons with repeated measures, using a conservative threshold of p < 0.01 to control for multiple comparisons.

174

M.A. Horsfield et al. / Journal of the Neurological Sciences 216 (2003) 169–177

Table 2 summarizes the BPF data for the FA and SA processing strategies. It can be seen that the mean and SD BPF values are quite similar. CoV indices range from 0.31% for FA 2DSE to 1.07% for FA 3D. Test – retest correlations are similarly strong, ranging from 0.94 for SA 3D to 0.99 for FA 2DSE and FA FLAIR. The methods were compared to assess whether there are differences in reliability. Analysis of variance (within subjects) of CoV revealed a trend toward significance ( F = 3.6, p = 0.02). Pairwise comparisons were significant only for FA 2DSE vs. FA 3D ( p = 0.007). All Pearson reliability coefficients were of similar magnitude and 0.95 or above. Mean BPF values were all strongly inter-correlated. Correlations ranged from 0.86 (FA FLAIR and SA 3D) to 0.99 (FA 2DSE and SA 2DSE). We next asked if the modest differences among the mean BPF values were significant. In this case, the within-subjects analysis of variance was significant ( F = 16.0, p < 0.001). To shed light on the nature of this effect, we tested a 2 (processing)  3 (sequence) ANOVA model. This analysis revealed two main effects, indicating that BPF values were influenced by both processing method ( F = 12.9, p = 0.003) and sequence ( F = 18.9, p < 0.001). Overall, higher BPF values were obtained via SA processing and the 2DSE scanning method. Pairwise comparisons were significant for the following pairings: FA 2DSE vs. FA 3D ( p < 0.001), FA 2DSE vs. SA 2DSE ( p = 0.002), FA 2DSE vs. SA 3D ( p < 0.001), FA FLAIR vs. FA 3D ( p = 0.001), FA FLAIR vs. SA 2DSE ( p = 0.009), FA FLAIR vs. SA Subtraction ( p = 0.009), FA 3D vs. SA 2DSE ( p < 0.001), FA 3D vs. SA FLAIR ( p = 0.001), FA 3D vs. SA Subtraction ( p < 0.001), SA 2DSE vs. SA FLAIR ( p = 0.002), SA 2DSE vs. SA 3D ( p < 0.001) and SA 3D vs. SA Subtraction ( p = 0.002). The clinical and MRI characteristics of the 57 patients from the placebo arm of the clinical trial are reported in Table 3. The spin-echo T1-weighted scans from all 57 patients were amenable to fully automated processing, and analysis yielded a mean (S.D.) BPF at baseline of 0.861 (0.030). At baseline, BPF was significantly correlated with both age (r =  0.36, p < 0.01) and disease duration (r =  0.34, p < 0.05), but not with either EDSS or the number of relapses during the 2-year prestudy period. At Table 2 MRI data and validity coefficients Sequence/analysis

Mean BPF

S.D.

CoV (%)

Reliability coefficient r

FA 2DSE FA FLAIR FA 3D SA 2DSE SA FLAIR SA 3D SA Subtraction

0.837 0.834 0.814 0.843 0.834 0.821 0.841

0.040 0.035 0.048 0.041 0.040 0.042 0.041

0.31 0.32 1.07 0.66 0.50 1.03 0.53

0.99 0.99 0.97 0.97 0.99 0.94 0.98

2DSE = 2D spin-echo sequence; FLAIR = fluid-attenuated inversion recovery sequence; 3D = 3D flash sequence; subtraction = spin-echo subtraction image. FA = fully automated analysis; SA = semi-automated analysis.

Table 3 Baseline clinical and MRI characteristics of 57 patients from the placebo arm of the 9003 Glatiramer Acetate trial Patient characteristic at baseline Age/years Percentage women Prestudy disease duration/years Median EDSS score (range) Median number of relapses during the 2-year prestudy period (range) T2 lesion volume/ml Number of Gd-enhancing lesions Volume of T1 black holes/ml

33.8 (7.7) 68.4% 5.1 (3.9) 2.0 (0.0 – 5.0) 2.0 (0 – 6) 19.6 (15.2) 4.0 (5.3) 3.3 (4.7)

Mean values, with standard deviation in parentheses, are shown, with the exception of EDSS and number of relapses.

the 9-month follow-up, BPF had dropped to 0.857 (0.031), with an average change of  0.37% (not significant). Thus, the annualized change in BPF was  0.50% per year. The range of BPF changes observed over the 9-month follow-up period was + 2.3% to  3.1%. Baseline and follow-up BPF measures were strongly correlated (r = 0.96, p < 0.001), and at baseline, BPF was moderately correlated with lesion burden (T2 r =  0.63, T1 r =  0.54).

4. Discussion This study examines this issue of reproducibility of measures of cerebral atrophy, how that is influenced by both the type of MRI scan used to measure it, and the type of analysis of those scans. For use in clinical studies and clinical trials of agents designed to reduce CA, a method of estimating CA must be both reproducible and capable of being implemented across multiple MRI scanning centers. In the present study, we studied the influences of the pulse sequence used to collect the image data, the analysis method (automated vs. semi-automated), and in order to ensure the general applicability of our results, scans were performed at two centers using two models of MRI scanner. Reproducibility was assessed by repeating each scan, with a sufficiently short time interval between them that there was little chance of a material change of brain volume occurring. The technique used was validated by analyzing, post hoc, data from the placebo arm of a clinical trial for a therapeutic agent for MS that were acquired at 18 different scanning centers. The scan – rescan reproducibility of BPF we observed is similar to that achieved by other groups using the FLAIR sequence (e.g., 0.19% in Ref. [18] compared to 0.32% in the present study) and the 2DSE sequence (e.g., 0.41% in Ref. [16] compared to 0.31% in the present study). Our study did not address the issue of measurement differences between different types of scanner, since this was the subject of a previous study [26], where it was shown that good reproducibility of CA measures can be achieved across scanners when using the same pulse sequence. In the context of longitudinal clinical trials, it is important that the

M.A. Horsfield et al. / Journal of the Neurological Sciences 216 (2003) 169–177

same scanner should be used for any given patient where possible, and that strict quality assurance procedures should be in place should a change of scanner or an upgrade to an existing scanner become necessary [21]. It is clear from our study that brain parenchymal fraction [18], an index of brain atrophy, can be measured using a variety of MRI pulse sequences where there is a marked contrast in the images between the CSF and brain parenchyma. However, on some scanners, images produced by subtraction of early and late echoes in a dual-echo T2weighted scan are of poor quality, possibly due to differences in eddy-current-induced image distortion in the two echoes. This, at least with our algorithm, leads to unreliable segmentation of the parenchyma and CSF spaces when fully automated analysis is attempted. In multi-center clinical trials, where many different scanner types are used, this problem could become acute when the segmentation algorithms are developed and tested only using images from a single type of scanner. Surprisingly, 3D gradient echo acquisition gave a larger variability in BPF than the multi-slice 2DSE and FLAIR sequences, both of which gave very similar and excellent reproducibilities. In order to provide a comparison of sequences unrelated to resolution, we maintained the same acquisition time and resolution for all sequences, and as a result the in-plane resolution for the 3D acquisition was relatively modest (1.4 mm3) compared with previously used protocols [27], which may have contributed to the poor reproducibility. Visual inspection of the segmented brain and CSF spaces showed that the largest differences in the segmented areas occurred around the orbits, sagittal sinus and internal carotid artery. In this respect, the reproducibility for this sequence might have been improved by providing saturation regions for suppression of inflowing blood, frequency-selective fat saturation pulses, or finer spatial resolution. However, all these putative improvements would lead to increased image acquisition time. In a minority of the 3D scans, there was also a tendency for the combined segmented brain and CSF space at the end of the first stage of processing to ‘‘leak’’ from the temporal lobes into the muscles of the face. This could have been avoided by more refined cropping of the images to remove this tissue before analysis, but would have moved away from the notion of a fully automated assessment. During semi-automated analysis, corrections were made to what the operator considered were errors in the first stage of processing—the production of an outline of the combined parenchyma and CSF space. In the case of 3D acquisition, this correction resulted in a slight improvement in reproducibility, while in the case of the 2DSE and FLAIR sequences, the reproducibility was worsened. This indicates that there were clear and variable errors in the automatic segmentation of the brain and CSF space for the 3D sequence, while for the 2DSE and FLAIR sequences, the errors, while apparent to the operator, were consistent from scan to scan. The correction of these consistent errors by the operator lead to a degradation of reproducibility, since

175

operator-dependent inconsistencies were introduced at this stage. The issue of accuracy has not been addressed directly by this paper. However, the 9-month follow-up study of patients in the placebo arm of the clinical trial shows that the techniques employed are capable of detecting small changes in CA, and that our results (in terms of both the absolute BPF and the annualized rate of its decline) are consistent with those obtained in another study of a similar cohort of patients [13]. The slightly higher rate of CA in our cohort may be attributable to the fact that these patients were selected because of the presence of at least one enhancing lesion immediately prior to enrolment. Manual correction of the brain outlines during SA processing resulted in increased BPF (i.e., a reduction in the measured volume of CSF) for the 2DSE and 3D sequences, while FLAIR images were unaffected. On the 2DSE and 3D images, errors in automated processing were commonly found between the temporal lobes anterior to the brain stem, and around the sagittal sinus, both regions of low signal intensity that, without manual correction, were classified as CSF. The FLAIR images did not suffer from these problems, and in this respect can be considered as giving the most accurate segmentation (i.e., consistent with that defined by an experienced human operator). While our results were obtained with our own analysis method, we would expect these results to generalize to other analysis methods, since the basic principles are often similar [7,14,15,18,19]. A likely exception to this is techniques that rely on the manual or semi-automated segmentation of the brain and CSF space [20,27]. For longitudinal studies, the series of scans is often registered with this segmented brain so that the segmentation procedure only needs to be performed one. In this case, images acquired with phase encoding in the ‘‘slice’’ direction (i.e., 3D acquisition) have a clear advantage, since they can be resliced during the registration process without loss of image quality [28]. It may be that such analysis methods would gain an advantage from 3D acquisition. There are of course other forms of segmentation algorithm, such as template-based segmentation [29] and multispectral approaches [30]. In addition, other works seek to characterize atrophy by an absolute change in the parenchymal volume rather than the ratio approach of the BPF. However, because of drift in scanner magnetic field gradient strength over time the absolute volume must be normalized or corrected in some way using an independent volume measure often based on the dimensions of the skull, which is considered as invariant over time [31]. Whether the findings of the present study generalize to other segmentation approaches and atrophy measures remains as the subject of future studies.

5. Conclusions The aim of this study was to compare the performance of different MRI acquisition pulse sequences when used to

176

M.A. Horsfield et al. / Journal of the Neurological Sciences 216 (2003) 169–177

acquire data for assessing cerebral atrophy. We compared 2D spin echo, FLAIR, 3D gradient echo and a dual-echo sequence (with echo subtraction). All sequences are capable of yielding good reproducibility, with the differences in test – retest reliability being small. However, the images produced by subtracting early and late echoes in a doubleecho T2-weighted sequence are of variable quality, and so this technique may be unsuitable for the assessment of CA in the setting of a multi-center clinical study or trial. Surprisingly, 3D gradient acquisition gave the worst reproducibility, although the acquisition protocol was limited by our desire to directly evaluate the influence of the pulse sequence, without the confounding factors of differences in spatial resolution or scan time. Optimization of resolution and signal to noise ratio, together with fat and inflow suppression techniques, might have improved the reproducibility, and this should be the subject of a future study. There were clear inaccuracies in the segmentation of combined brain and CSF spaces by our fully automated algorithm. However, when these errors were corrected manually (semi-automated processing), the reproducibility of the BPF measurement worsened for the 2DSE and FLAIR sequences, although it was marginally improved for the 3D sequence. Thus, fully automated and accurate segmentation must be the ultimate goal of any analysis scheme. There were small but reliable differences in BPF for the different scanning sequences and analysis techniques. In general, BPF was higher with the 2DSE sequence and with semi-automated analysis. However, the reproducibility, rather than accuracy (and therefore ability to detect small changes over time) may be the most important factor when deciding on a strategy to measure cerebral atrophy.

Acknowledgements This study was supported in part by research grants to R. Bakshi from the National Institutes of Health (NIH-NINDS 1 K23 NS42379-01) and the National Multiple Sclerosis Society (RG 3258A2/1). The authors thank Teva Pharmaceutical Industries, for allowing us to use the data from the placebo arm of the European –Canadian MRI-monitored Glatiramer Acetate trial. Technical support was provided by Jin Kuwata and Joe Filippini.

References [1] Davis P, Mirra S, Alazraki N. The brain in older persons with and without dementia. AJR 1994;162:1267 – 78. [2] Miller DH, Barkhof F, Frank JA, et al. Measurement of atrophy in multiple sclerosis: pathological basis, methodological aspects and clinical relevance. Brain 2002;125:1676 – 95. [3] Fox NC. Magnetic resonance imaging in Alzheimer’s disease: from diagnosis to measuring therapeutic effect. Alzheimer’s Rep 1999;2: 5 – 12.

[4] Simon HJ, Jacobs LD, Campion MK, et al. A longitudinal study of brain atrophy in relapsing multiple sclerosis. Neurology 1999;53: 139 – 48. [5] Ge Y, Grossman RI, Udupa JK, et al. Brain atrophy in relapsing – remitting multiple sclerosis and secondary progressive multiple sclerosis: longitudinal quantitative analysis. Radiology 2000;214: 665 – 70. [6] Liu X, Blumhardt LD. Inflammation and atrophy in multiple sclerosis: MRI associations with disease course. J Neurol Sci 2001;189: 99 – 104. [7] Losseff NA, Wang L, Lai HM, et al. Progressive cerebral atrophy in multiple sclerosis: a serial MRI study. Brain 1996;119:2009 – 19. [8] Brex PA, Jenkins R, Fox NC, et al. Detection of ventricular enlargement in patients at the earliest clinical stage of MS. Neurology 2000;54: 1689 – 91. [9] Filippi M, Mastronardo G, Rocca MA, et al. Quantitative volumetric analysis of brain magnetic resonance imaging from patients with multiple sclerosis. J Neurol Sci 1998;158:148 – 53. [10] Dastidar P, Heinonen T, Lehtimaki T, et al. Volumes of brain atrophy and plaques correlated with neurological disability in secondary progressive multiple sclerosis. J Neurol Sci 1999;165:36 – 42. [11] Fox NC, Jenkins R, Leary SM, et al. Progressive cerebral atrophy in MS. A serial study using registered, volumetric MRI. Neurology 2000;54:807 – 12. [12] Fisher E, Rudick RA, Cutter G, et al. Relationship between brain atrophy and disability: an 8-year follow-up study of multiple sclerosis patients. Mult Scler 2000;6:373 – 7. [13] Rudick RA, Fisher E, Lee JC, et al. Brain atrophy in relapsing – remitting multiple sclerosis: relationship to relapses, EDSS, and treatment with interferon beta-1a. Mult Scler 2000;6:365 – 72. [14] Zivadinov R, Sepcic J, Nasuelli D, et al. A longitudinal study of brain atrophy and cognitive disturbances in the early phase of relapsing – remitting multiple sclerosis. J Neurol Neurosurg Psychiatry 2001;70: 773 – 80. [15] Paolillo A, Pozzilli C, Gasperini C, et al. Brain atrophy in relapsing – remitting multiple sclerosis: relationship with ‘‘black holes’’, disease duration and clinical disability. J Neurol Sci 2000;174:85 – 91. [16] Bermel RA, Sharma J, Tjoa CW, Puli SR, Bakshi R. A semiautomated measure of whole-brain atrophy in multiple sclerosis. J Neurol Sci 2002;208:57 – 65. [17] Atkins MS, Mackiewich BT. Fully automatic segmentation of the brain in MRI. IEEE Trans Med Imag 1998;17:98 – 107. [18] Rudick RA, Fisher E, Lee JC, Simon J, Jacobs L. Use of the brain parenchymal fraction to measure whole brain atrophy in relapsing – remitting MS. Neurology 1999;53:1698 – 704. [19] Smith SM, Zhang Y, Jenkinson M, et al. Accurate, robust and automated longitudinal and cross-sectional brain change analysis. Neuroimage 2002;17:479 – 89. [20] Leigh R, Ostuni J, Pham D, et al. Estimating cerebral atrophy in multiple sclerosis patients from various MR pulse sequences. Mult Scler 2002;8:420 – 9. [21] Filippi M, Horsfield MA, Ader HJ, et al. Guidelines for using quantitative measures of brain magnetic resonance imaging abnormalities in monitoring the treatment of multiple sclerosis. Ann Neurol 1998;43: 499 – 506. [22] Comi G, Filippi M, Wolinsky JS. European/Canadian multicenter, double-blind, randomized, placebo-controlled study of the effects of glatiramer acetate on magnetic resonance imaging-measured disease activity and burden in patients with relapsing multiple sclerosis. Ann Neurol 2001;49:290 – 7. [23] Sled JG, Zijdenbos AP, Evans AC. A nonparametric method for automatic correction of intensity nonuniformity in MRI data. IEEE Trans Med Imag 1998;17:87 – 97. [24] Smith S. Fast robust automated brain extraction. Hum Brain Mapp 2002;17:143 – 55. [25] VanWalderveen MAA, Barkhof F, Hommes OR, et al. Correlating MRI and clinical disease activity in multiple sclerosis—relevance of

M.A. Horsfield et al. / Journal of the Neurological Sciences 216 (2003) 169–177 hypointense lesions on short-TR short-TE (T1-weighted) spin-echo images. Neurology 1995;45:1684 – 90. [26] Gasperini C, Rovaris M, Sormani MP, Bastianello S, Pozzilli C, Comi G, et al. Intra-observer, inter-observer and inter-scanner variations in brain MRI volume measurements in multiple sclerosis. Mult Scler 2001;7:27 – 31. [27] Fox NC, Freeborough PA. Brain atrophy progression measured from registered serial MRI: validation and application to Alzheimer’s disease. JMRI 1996;7:1069 – 75. [28] Hajnal JV, Saeed N, Soar EJ, et al. A registration and interpolation procedure for subvoxel matching of serially acquired MR images. J Comput Assist Tomogr 1995;19:289 – 96.

177

[29] Karas GB, Burton EJ, Rombouts SARB, et al. A comprehensive study of gray matter loss in patients with Alzheimer’s disease using optimized voxel-based morphometry. Neuroimage 2003;18:895 – 907. [30] Alfano B, Brunetti A, Covelli EM, et al. Unsupervised, automated segmentation of the normal brain using a multispectral relaxometric magnetic resonance approach. Magn Reson Med 1997;37:84 – 93. [31] Whitwell JL, Crum WR, Watt HC, Fox NC. Normalization of the cerebral volumes by use of intracranial volume: implications for longitudinal quantitative MR imaging. Am J Neuroradiol 2001;22: 1483 – 9.