Apparent diffusion coefficient (ADC) measurements may be more reliable and reproducible than lesion volume on diffusion-weighted images from patients with acute ischaemic stroke–implications for study design

Apparent diffusion coefficient (ADC) measurements may be more reliable and reproducible than lesion volume on diffusion-weighted images from patients with acute ischaemic stroke–implications for study design

Magnetic Resonance Imaging 21 (2003) 617– 624 Apparent diffusion coefficient (ADC) measurements may be more reliable and reproducible than lesion vol...

540KB Sizes 2 Downloads 36 Views

Magnetic Resonance Imaging 21 (2003) 617– 624

Apparent diffusion coefficient (ADC) measurements may be more reliable and reproducible than lesion volume on diffusion-weighted images from patients with acute ischaemic stroke–implications for study design Arnab K. Ranaa, Joanna M. Wardlawa,*, Paul A. Armitagea, Mark E. Bastinb a

Department of Clinical Neurosciences,, University of Edinburgh, Western General Hospitals NHS Trust, Edinburgh, EH4 2XU, UK b Department of Medical Physics, University of Edinburgh, Western General Hospitals NHS Trust, Edinburgh, EH4 2XU, UK

Abstract Early ischemic change after stroke can be demonstrated with diffusion-weighted imaging (DWI) and quantified by measuring the apparent diffusion coefficient (ADC) and/or lesion volume. We examined the reliability and reproducibility of lesion volume and ADC measurement on DWI images, and discuss the implications for clinical studies. Using 38 DWI scans from 15 stroke patients, two observers (a physicist and a neuroscience graduate) blind to each other, recorded the lesion volume on DWI sequences, measured the ADC values in this volume and calculated the ratio of ischemic: control ADC (ADCr). One observer repeated his measurements blind to his first, and also examined the effect on lesion volume and ADC of deliberately varying by only one pixel, the outline of the visible boundary of the lesion. The inter and intra-rater reliability were worse for lesion volume than ADC or ADCr measurements: lesion volume, inter-rater coefficient of variation (CoV) 85 ⫾ 130%, intra-rater CoV 20⫹/⫺SD80% (p ⬍ 0.05); ADC inter-rater CoV 7.7 ⫾ SD 19%, intra-rater CoV 0.2 ⫾ SD 12% (p ⫽ NS); and ADCr inter-rater CoV 8 ⫾ SD27%, intra-rater CoV 0.8 ⫾ SD73% (p ⫽ NS). Altering the position of the outline tracing of the lesion boundary by one pixel altered the measured volumes by 22 ⫾ SD25% (p ⬍ 0.05), but ADC values were altered by only 2.9 ⫾ SD4.9% and ADCr by 2.7 ⫾ SD4.8% (p ⫽ NS). ADC and ADCr values are more reliable and reproducible than DWI lesion size in acute ischemic stroke because altering where the lesion boundary is measured has a much greater impact on lesion volume than on the ADC or ADCr. This effect is greatest in large lesions. © 2003 Elsevier Inc. All rights reserved.

1. Introduction In studies using diffusion imaging (DWI) in acute ischemic stroke, the apparent diffusion coefficient (ADC) or lesion volume are often calculated to quantify the abnormality [1,2]. A region-of-interest (ROI) is usually drawn around [3,4] or just inside [5,6] the edge of the abnormal area on the DW image. More recent automated methods use a threshold ADC value to detect the lesion boundary and other parameters [4,7,8]. The measured ADC in any particular lesion probably depends on the size, shape and position of the ROI, and the rate of change in ADC between normal and ischemic brain [7]. Few studies have examined the variability of different

* Corresponding author. Tel.: ⫹1-0131-537-3110; fax: ⫹1-0131-3325150. E-mail address: [email protected] (J.M. Wardlaw). 0730-725X/03/$ – see front matter © 2003 Elsevier Inc. All rights reserved. doi:10.1016/S0730-725X(03)00087-0

methods of ADC measurement [9,10], although variability could be important. Variability arising during the image analysis stage might account for differences between stroke centers performing observational imaging studies and could affect the result of clinical trials of new drug treatments for stroke in which an imaging surrogate outcome is to be used. Despite the potential impact of the analysis component of DWI, there is virtually no published information on measurement variability. We therefore attempted to quantify the variability which might arise if different observers performed the ADC measurement (reliability) and if one observer repeated his measurements (repeatability). To assess the impact of apparently small changes in the ROI, we also deliberately made a small reduction in the size of the ROI, and re-measured the ADC and DWI lesion volume. We then calculated the effect of this variability on sample sizes required in observational or treatment studies where a surrogate outcome based on DWI might be used.

618

A.K. Rana et al. / Magnetic Resonance Imaging 21 (2003) 617– 624

2. Materials and methods 2.1. Patient population Eighty consecutive patients with acute ischemic stroke, prospectively recruited from our hospital’s acute stroke service, underwent diffusion MR imaging between April 1998 and 16 August 1999. Thirty patients underwent serial MR imaging. Fifteen of these patients were used in the present analysis of repeatability, chosen to represent a range of patients with mild to severe stroke, small to large DWI lesions at various times after stroke. Each patient’s images were analyzed by two observers. 2.2. Diffusion imaging MR imaging was performed on a GE 2T MRI scanner (Milwaukee, USA). Echo-planar DW imaging was performed with a diffusion-sensitizing gradient of b⬇700 s/mm2 (the optimal value for this machine). Ten 6-mm thick slices with a 20% slice gap were obtained in 40s. Fifteen slices were obtained from one of the patients. To avoid errors due to anisotropy [11], diffusion gradients were applied along three perpendicular directions and averaged to give the final image, i.e., ADC ⫽

ADC x ⫹ ADC y ⫹ ADC z 3

2.3. Selection of the ROI The diffusion images were printed onto paper, and a neuroradiologist marked round abnormal-looking cortical gray, subcortical gray and white matter, blind to clinical baseline and outcome data. Two independent observers (a physicist with three years’ experience of DWI and a neuroscientist studying medicine) independently viewed the diffusion images on a Sun Ultra Sparc Station 10 (Sun Microsystems, Mountain View, CA, USA), chose contrast and gray scale settings themselves to optimize lesion and gray/white matter visualization, and traced round the lesion on the workstation guided by the neuroradiologist’s drawing. They chose contralateral regions of equivalent size, shape and position themselves, to act as normal control areas. The lesion was outlined and measured on each slice on which it was visible in each patient. The observers did not precisely agree on an outlining protocol before embarking on this test. Observer one (physicist) selected the ROIs in an attempt to minimize partial volume effects and therefore tended to avoid regions that were ill-defined or highly diffuse on the images, while observer two (neuroscientist) attempted to select what he felt was the full extent of the lesion and so was more likely to have included less well defined areas in the ROI and was more aware of anatomic boundaries than the physicist. Both observers were blinded to clinical baseline and follow up details, any other mea-

surements, and specifically to each other’s measurements. Observer 1 (the physicist) outlined a contralateral control brain region of exactly the same size as the ischemic one (regardless of side to side variation in anatomy), but observer 2 (the neuroscience graduate studying medicine) chose an area based on neuroanatomical features and counted the number of voxels in both ischemic and control areas to ensure compatibility (i.e., used a more anatomic approach). For the assessment of intra-observer variability, observer 2 repeated his measurements eighteen months later, blind to his first measurements but using the same method of identifying the ROI as for the first measurements. Finally, to test the magnitude of the effect on repeatability of an apparently marginal change in tracing the lesion outline, a small reduction in the size of the ROI was deliberately produced by switching off the “sample under border” option in CNS Analyze (Mayo Foundation, Rochester, MN, USA). This excluded the one-pixel-width band of the outline when sampling the lesion volumes and ADCs, which had the effect of tracing round the inner rather than the outer edge of the visible lesion border, i.e., an absolutely minimal change. 2.4. Quantifying ADC values The saved regions of interest were then loaded onto the quantitative diffusion image and the ADCs were sampled from the whole lesion (mean and standard deviation). Summing the number of outlined voxels from each slice gave the total lesion volume on DWI. 2.5. Assessment of variability The variability between observers 1 and 2 was then analyzed using the Bland-Altman method which measures agreement and bias between two observers for continuous variables [12]. The mean of, and the difference between, the two observers was calculated for abnormal and normal brain regions. The coefficient of variation (CoV) is the standard deviation of the mean difference divided by the pooled mean values, and is expressed as a percentage value [13]. 2.6. Assessment of the impact of observer variability In order to assess the impact that observer variability might have on the interpretation of studies using DWI, we calculated theoretical sample sizes with the assumption of a normal distribution, delta ⫽ 5%, alpha ⫽ 0.05 and power ⫽ 0.80. This was based on a two-sample t test, since in a pharmaceutical trial, a test group of patients would be compared with a control group. On the basis of Perkins et al. 2000 [14], we quantified the improvement which would occur if readings from two observers were used instead of one, by averaging each corresponding measurement from observer 1 and observer 2’s second measurement, and using

A.K. Rana et al. / Magnetic Resonance Imaging 21 (2003) 617– 624

the SD of this group of values. We used observer 2’s second measurement because he was by then more experienced and therefore the interobserver variability was likely to be less than if we had used his first measurement (we did not wish to over estimate the effect of observer variability).

3. Results Fifteen patients were scanned, 7 on two occasions and 8 on three occasions, giving a total of 38 scans. Since control and ischemic regions, and gray and white matter regions were outlined separately, the scans resulted in a total of 134 ROIs. Bland-Altman plots of observer variability are presented in Fig. 1, and a comparison of the coefficients of variation using different measurement techniques is presented in Fig. 2. There was significant inter-rater variability in measurements of lesion volume and ADC in both the ischemic and control regions (Tables 1 and 2). The mean difference between observers was 574 ⫾ SD865 voxels for control regions, which amounted to 76 ⫾ SD114% of the average control region size. For ischemic regions the difference between observers was 93 ⫾ SD139% of the average lesion size, and ischemic and control regions combined gave a percentage difference of 85 ⫾ SD130%. Expressed as percentages, the difference between observers in ADC measurements was 5.0 ⫾ SD17% for control brain regions, 11 ⫾ SD20% for ischemic regions and 7.7 ⫾ SD19% for ADC measurements as a whole. It is noted that the observer variability of ADC measurements was greater for small volume lesions than for larger ones (Fig. 1d-f). ADCr measurements reduced the systematic bias between observers to a small, non-significant one, but the coefficient of variation rose to 73% (Table 3). This may be because standard deviations are particularly susceptible to outlier points (see Fig. 1g legend); removing the one extreme point reduced the coefficient of variation to 27% but changed the mean difference to ⫺0.056 (now a significant difference with p ⫽ 0.01). Intra-rater variability for observer 2 for each type of measurement was less than the corresponding inter-observer variability (Fig. 2). Pooling the ischemic and control brain regions from observer 2 confirms that he outlined significantly larger volumes on his second attempt (mean difference between first and second measurements ⫽ ⫺20 ⫾ SD80%, p ⫽ 0.004). However, his pooled ADC measurements revealed no such difference (mean difference ⫽ ⫺0.21 ⫾ SD12%, p ⫽ 0.90). Deliberately making a minor alteration to the position of the region of interest border produced a systematic reduction in measured volume, which was proportionately greater for larger than for smaller lesions (Fig. 1c). The average reduction in ischemic and control volumes was 22% ⫾ SD25%, but the consequent average reduction in ADC was only 2.9% ⫾ SD4.9%. Ischemic-region ADCs were af-

619

fected more than control-region ADCs, reducing the ischemic⫼control ratio by an average of 2.7% ⫾ SD4.8%. Sample size calculations for theoretical trials using an imaging surrogate outcome such as lesion volumes, ischemic region ADCs and ADCr values are shown in Table 4. Larger sample sizes would be required if volume rather than ADC or ADCr were used as a surrogate outcome measure because of the large coefficient of variation. Averaging values with a second observer would reduce the spread of data points and make a modest reduction to the sample size.

4. Discussion Observer variability is an important contributor to the measurement of DW lesion volume, ADC and ADCr. Lesion volume measurements were particularly subject to observer variability which was greater between observers than in measurements repeated by one observer. This component of diffusion imaging has been neglected, but given the difference in the method of choosing the ROI between our two observers, and that lesions on DWI can be both patchy and ill-defined particularly soon after stroke onset, it is not surprising that there was a significant difference in our study in the volumes measured by the two observers. However the two approaches to the ROI sampling used by the two observers are analogous to those used in different labs (personal communication) and both are valid. What is perhaps surprising and should be noted is the considerable difference that can occur between observers apparently setting out to accomplish the same thing, and each with valid reasons for their chosen method. The difference between observers for ROI size measurements was greater for larger lesions. As a large proportion of the volume of a structure is in the outer layers, apparently minor differences in placing the edge of a lesion add a greater excess volume to large volumes than to smaller ones, as shown previously [15]. This relationship is confirmed by deliberately excluding the sample border, which reduces measured volume difference almost linearly with average volume (Fig. 1c). It is also possible that large lesions have a less distinct edge, thereby augmenting any difference in its perception by the two observers. ADC appears less vulnerable to observer variability, and the ADCr even less so: ADCr has no systematic bias between observers whereas ADC does, but ADCr has a higher coefficient of variation than ADC. That is, if an observer measures a proportionally higher ADC from both ischemic and control brain regions, then on average this cancels out in the calculation of ADCr. The ADCr may also compensate for the effect of global brain pulsations, temperature, or electrolyte concentration on ADC [16,17,18]. Presumably the greater observer variability for lesion volume occurs because much of the volume of a lesion is in its outer “rind” so that a small alteration in where the ROI

620

A.K. Rana et al. / Magnetic Resonance Imaging 21 (2003) 617– 624

Fig. 1. Bland-Altman plots of the variability in ADC measurements. Obs 2.2 ⫽ observer 2’s second readings which were used as the reference standard. The mean of two sets of measurements is plotted on the horizontal axis, against their difference on the vertical axis. The scatter of points along the vertical axis gives an idea of random variability. Circles represent measurements taken from gray matter and triangles, from white matter. Open symbols indicate control brain regions, and closed symbols are ischemic ones. Regression lines with 95% confidence intervals are drawn. 1a: Interobserver variability of lesion volume measurements. 1b: Intraobserver variability of lesion volume measurements. 1c: The effect of omitting the ROI border (i.e., tracing just inside the lesion’s edge rather than on it) on lesion volume measurements. 1d: Interobserver variability of ADC measurements. 1e: Intraobserver variability for ADC measurements. 1f: The effect of omitting the ROI border on ADC measurements. 1g: Interobserver variability of ADCr measurements Note that the scale and prediction lines in this figure exclude one outlier point at [2.87,4.13]. 1h: Intraobserver variability of ADCr measurements. 1i: The effect of omitting the ROI border on ADCr measurements.

line is drawn makes a much larger difference to the measured volume. A 10% increase in diameter corresponds to about 30% increase in volume for a given diameter lesion. On the other hand, the ADC is perhaps uniformly reduced across a lesion and an average value is produced from the whole ROI. Accidentally including a little normal brain in

the ROI will have less overall effect on the average ADC than it does on the volume. In contrast to lesion size measurements, one might have speculated that small lesions would show a lower variability in ADC because they might be more prone to the accidental inclusion of normal tissue, and for small lesions this would

A.K. Rana et al. / Magnetic Resonance Imaging 21 (2003) 617– 624

Fig. 2. A comparison of the interobserver and intraobserver coefficient of variation using different measurement techniques. For tissue volume and ADC values, only ischemic areas were used because these would be compared between patients in clinical trials. One extreme value was removed from the interobserver variability in ADCr so as not to inappropriately distort it.

raise their ADC toward normal brain more than for large lesions. This may in part explain the positive relationship between ADC and lesion size (the partial volume effect) [19]. Our data are consistent with this in showing that smaller lesions have a greater, not a smaller variability in ADC measurement.

621

focused on differences in image acquisition [21,22]. Few studies have examined the variability of ADC measurements [10,21,18] but all by rescanning normal volunteers, not looking at the effect of lesion measurement. In contrast to the few variability studies of ADC, there are many on the subject of lesion volume measurement on different imaging modalities. Van der Worp et al. [13] compared different methods of infarct size measurement on computed tomography (CT), and found that manual tracing was the most reliable and reproducible compared with automated methods. Their inter-observer CoV was 19.9% and their intra-observer CoV was 12.2%, which is lower than reported here. Laubach et al. [23] constructed and measured a DWI stroke phantom and made comparisons with the true “lesion” volume. The true volume was estimated with an error of 1%, although smaller lesions had errors of over 100%, which was attributed to the slice thickness and gap. This suggests that thinner slices with no gap would reduce variability, but the increase in scan time may be impractical in the clinical setting. Their lesion shape was also less complicated than a typical human infarct which can be patchy, ill defined and difficult to discern a precise edge. In DWI studies of human stroke, Baird et al. [24] measured mean deviations of less than 5% in the lesion volume, but accounted for this error by using changes of 20% as a cut-off for defining change. Given that our two observers made their measurements without first agreeing on a lesion boundary, and that the time delay between repeat measurements by the same observer was 18 months, we have deliberately not artificially reduced the opportunities of the observer variability when compared to previous reports. 4.2. How might repeatability be improved?

4.1. Relationship to previous studies The ADC and ADCr values obtained here are higher than in some studies [5], but lower than in others [20]. Most previous discussions on sources of variability in DWI have

One improvement would involve two neuroradiologists agreeing on ROI boundaries and independently outlining the lesion. However, given the time-consuming and labor-intensive nature of manual tracing [25], one

Table 1 DWI lesion volume measurements and their observer variability. Observer 1 was compared with Observer 2’s second set of measurements to obtain inter-rater reliability, and Observer 2’s first with his second readings to obtain intra-rater reliability data. The values are of all 134 ROIs. Observer 1 Mean control tissue volume ⫾ SD (voxels) Difference from Observer 2’s second measure ⫾ SD (voxels) 2-tailed paired t-test Coefficient of variation Mean ischaemic tissue volume ⫾ SD (voxels) Difference from Observer 2’s second readings ⫾ SD (voxels) 2-tailed paired t-test Coefficient of variation * ⫽ significant at P ⬍ 0.05. N/A ⫽ not applicable.

471 ⫾ 634 ⫺575 ⫾ 865

Observer 2 first measure

Observer 2 second measure

Deliberate change in ROI border by one pixel

910 ⫾ 1151 ⫺135 ⫾ 680

1046 ⫾ 1376 N/A

826 ⫾ 1131 ⫺220 ⫾ 250

P ⬍ 0.001* 114.0%

P ⫽ 0.11 69.6%

471 ⫾ 634 ⫺812 ⫾ 1223

986 ⫾ 1251 ⫺297 ⫾ 976

P ⬍ 0.001* 139.4%

P ⫽ 0.02* 86.0%

N/A N/A 1283 ⫾ 1692 N/A N/A N/A

P ⬍ 0.001* 26.7% 1044 ⫾ 1434 ⫺239 ⫾ 267 P ⬍ 0.001* 23.0%

622

A.K. Rana et al. / Magnetic Resonance Imaging 21 (2003) 617– 624

Table 2 ADC measurements and their observer variability. Observer 1 was compared with Observer 2’s second set of measurements to obtain inter-rater reliability, and Observer 2’s first with his second readings to obtain intra-rater reliability data. The values are from all 134 ROIs. Observer 1

Observer 2 first measure

Observer 2 second measure

Deliberate change in ROI border by one pixel

Mean control ADC ⫾ SD (⫻10⫺3mm2/s) Difference from Observer 2’s second readings ⫾ SD (⫻10⫺3mm2/s) 2-tailed paired t-test Coefficient of variation

1.32 ⫾ 0.30 ⫺0.069 ⫾ 0.237

1.39 ⫾ 0.24 ⫺0.003 ⫾ 0.143

1.39 ⫾ 0.24 N/A

1.37 ⫾ 0.24 ⫺0.023 ⫾ 0.053

Mean ischaemic ADC ⫾ SD (⫻10⫺3mm2/s) Difference from Observer 2’s second readings ⫾ SD (⫻10⫺3mm2/s) 2-tailed paired t-test Coefficient of variation

0.90 ⫾ 0.30 ⫺0.109 ⫾ 0.196

P ⫽ 0.02* 17.5%

P ⬍ 0.001* 20.5%

P ⫽ 0.87 10.2% 1.02 ⫾ 0.29 ⫺0.002 ⫾ 0.149 P ⫽ 0.87 14.8%

N/A N/A 1.01 ⫾ 0.30 N/A N/A N/A

P ⫽ 0.001* 3.8% 0.97 ⫾ 0.29 ⫺0.046 ⫾ 0.060 P ⬍ 0.001 4.6%

* ⫽ significant at P ⬍ 0.05. N/A ⫽ not applicable.

can envisage either paying the neuroradiologists for many hours of work at standard commission rates or this task being delegated to less expensive technical staff. In this respect, perhaps our study is realistic. Rovartis et al. [26] found that formal operator training improved both the intra- and inter- observer variability of lesion volume measurements in MS by naı¨eve technicians, but Molyneux et al. [27], found that adopting consensus guidelines among five experienced observers did not improve interobserver variability for lesion volume measurements. Thresholding methods may automate determination of the lesion boundary, and applying its mirror image to the opposite hemisphere may better identify a control region although this was not bourn out in a CT study [13]. However this would not necessarily identify a neuroanatomically correct contralateral ROI owing to brain swelling, for example, and so some manual interaction is still likely to be required. It is therefore unsurprising that manual intervention still plays a large part in determining the ROI, a point confirmed through our correspondence with related laboratories. 4.3. Implications Variation in ROI definition may explain some of the differences between laboratories in published ADC values.

It may contribute to variation in the ADC of patients imaged sequentially after stroke. Indeed, the analysis presented here suggests that observer variability might lead to different conclusions from different studies if very few patients are used and the ADC is not adequately quantified. 4.4. What are the implications for clinical studies? Firstly, it is important to standardize the method used to outline any lesion. Secondly, for consistency, any observers should each measure the entire data set in a particular study. Thirdly, using the ADCr may compensate for any systematic bias. And fourthly, a surprisingly large number of patients may still be needed to demonstrate differences reliably. This latter point can be illustrated with sample size calculations. For example, if at a certain time point after the onset of ischemic stroke, a control group of patients has a mean infarct ADCr of 0.760 ⫾ SD 0.25, then to detect an improvement of 5% in a group treated with the test drug (not unrealistic with clinical trials) would require 1276 patients in each of the active and control groups (total 2552). Using ADC measurements and averaging the data from two observers [14] could reduce this to 252 patients in each group (total N ⫽ 504), with a consequent reduction in the cost and duration of a study [14,28]. Jiang et al. [29], suggest that the expected difference may be more than 5%,

Table 3 ADCr measurements and their observer variability. Observer 1 was compared with Observer 2’s second set of measurements to obtain inter-rater reliability, and Observer 2’s first with his second readings to obtain intra-rater reliability data. Observer 1 column (N ⫽ 67) includes one outlier; excluding this changes the values of mean ADCr to 0.71 ⫾ SD 0.19, difference from Observer 2’s second readings to 0.056 ⫾ SD 0.19, P ⫽ 0.01 and coefficient of variation to 27.2%. The values are for all 134 ROIs.

Mean ADCr ⫾ SD Difference from Observer 2’s second readings ⫾ SD 2-tailed paired t-test Coefficient of variation * ⫽ significant at P ⬍ 0.05. N/A ⫽ not applicable.

Observer 1

Observer 2 first measure

Observer 2 second measure

Deliberate change in ROI border by one pixel

0.74 ⫾ 0.55 0.006 ⫾ 0.543 P ⫽ 0.92 73.2%

0.74 ⫾ 0.24 0.005 ⫾ 0.136 P ⫽ 0.78 18.3%

0.74 ⫾ 0.23 N/A N/A N/A

0.72 ⫾ 0.23 ⫺0.023 ⫾ 0.035 P ⬍ 0.001* 4.8%

A.K. Rana et al. / Magnetic Resonance Imaging 21 (2003) 617– 624

623

Table 4 Sample size calculations based on data obtained with a 2-tailed 2-sample t-test, alpha ⫽ 0.05, power ⫽ 0.80, and delta ⫽ 5%. The SD of using one observer is the midpoint of the two SDs obtained by Observer 1 and Observer 2’s second readings; the SD of using two observers is the SD of the dataset produced by averaging the pairs of values obtained by Observers 1 and 2’s second readings.

Selected measure Lesion size (voxels) Ischaemic tissue ADC (⫻10⫺3mm2/s) ADCr

Absolute value of a 5% difference between patient groups A and B

SD using one observer

Predicted sample size

SD using two observers

Predicted sample size

129 0.05

1163 0.30

2552 1132

1122 0.20

2376 504

0.038

0.21

960

0.19

786

and paired measurements in which the ADCr is measured before and after a treatment would also reduce sample size-however this may be inappropriate in acute ischemic stroke because the ADCr changes over time [5] and a separate control group is therefore required. Furthermore, changes in ADCr do not imply recovery, merely evolution of tissue damage. These sample sizes are not dissimilar to the average size of many recent trials of new pharmacological agents in acute ischemic stroke with clinical endpoints, suggesting that use of MR as a surrogate outcome would not actually reduce sample size (or cost) of acute stroke treatment trials. Clearly, the observer variability has an important impact on ADC measurements, and this should be taken into account when designing studies that use it. Improving the outlining technique may further reduce the variation in ADC and ADCr measurements, since the outline determines which voxels are sampled for their ADC values. Whether this would leave ADC and ADCr measurements superior to lesion size measurements requires confirmation. Future studies should look further into the variability of ADC measurements and suggest practical ways in which it can be improved, and not just assume that greater automation would improve measurement reliability. Studies in animals with histologic comparison with imaging features might help identify which parameter (ADC or DWI lesion volume) best identifies the amount of ischemic brain.

Acknowledgments This work was undertaken at the SHEFC Brain Imaging Research Center for Scotland (http://www.dcn.ed.ac.uk/ bic). It was funded by SHEFC grant No. 96/036 and a UK Medical Research Council CRI in Clinical Neurosciences. Arnab Rana was supported by a Wellcome Trust Summer Studentship and Paul Armitage was supported by a UK Medical Research Council PhD studentship.

References [1] Powers WJ. Testing a test: a report card for DWI in acute stroke. Neurology 2000;54:1549 –1551.

[2] Keir SL, Wardlaw J. A systematic review of diffusion and perfusion imaging in acute ischaemic stroke. Stroke 2000;31:2731–2731. [3] van Everdingen KJ, van der Grond J, Kapelle LJ, Ramos LMP, Mali WPTM. Diffusion-weighted magnetic resonance imaging in acute stroke. Stroke 1998;29:1783–1790. [4] Yang Q, Tress BM, Barber PA, Desmond PM, Darby DG, Gerraty RP, Li T, Davis SM. Serial study of apparent diffusion coefficient and anisotropy in patients with acute stroke. Stroke 1999;30:2382–2390. [5] Schlaug G, Siewert B, Benfield A, Edelman RR, Warach S. Time course of the apparent diffusion coefficient (ADC) abnormality in human stroke. Neurology 1997;49:113–119. [6] Warach S, Gaa J, Siewert B, Wielopolski P, Edelman RR. Acute human stroke studied by whole brain echo planar diffusion-weighted magnetic resonance imaging. Annals of Neurology 1995;37:231–241. [7] Reith W, Hasegawa Y, Latour LL, Dardzinski BJ, Sotak CH, Fisher M. Multisclice diffusion mapping for 3-D evolution of cerebral ischaemia in a rat stroke model. Neurology 1995;45:172–177. [8] Jacobs MA, Knight RA, Soltanian-Zadeh H, Zheng ZG, Goussev AV, Peck DJ, Windham JP, Chopp M. Unsupervised segmentation of multiparameter MRI in experimental cerebral ischemia with comparison to T2, diffusion, and ADC MRI parameters and histopathological validation. Journal of Magnetic Resonance Imaging 2000;11:425– 437. [9] Maier SE, Gudbjartsson H, Patz S, Hsu L, Lovblad K-O, Edelman RR, Warach S, Jolesz FA. Line scan diffusion imaging: characterisation in healthy subjects and stroke patients. American Journal of Ro¨ engenology 1998;171:85–93. [10] Bammer R, Stollberger R, Augustin M, Simbrunner J, Offenbacher H, Kooijman H, Ropele S, Kapeller P, Wach P, Ebner F, Fazekas F. Diffusion-weighted imaging with navigated interleaved echo-planar imaging and a conventional gradient system. Radiology 1999;211: 799 – 806. [11] Ulug AM, Beuchamp N, Bryan RN, van Zijl PCM. Absolute quantitation of diffusion constants in human stroke. Stroke 1997;28:483– 490. [12] Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;1:307– 310. [13] van der Worp HB, Claus SP, Ba¨ r PR, Ramos LMP, Algra A, van Gijn J, Kapelle LJ. Reproducibility of measurements of cerebral infarct volume. Chapter 6 in: HB van der Worp. Treatment of acute ischaemic stroke with antioxidants: the gap between laboratory and clinic. Utrecht University Thesis 1999:69-81. [14] Perkins DO, Wyatt RJ, Bartko JJ. Penny-wise and pound-foolish: the impact of measurement error on sample size requirements in clinical trials. Biological Psychiatry 2000;47:762–766. [15] Whalley HC, Wardlaw JM. Accuracy and reproducibility of simple cross-sectional linear and area measurements of brain structres and their comparison with volumes measurements. Neuroradiology 2001; 43(4):263–271.

624

A.K. Rana et al. / Magnetic Resonance Imaging 21 (2003) 617– 624

[16] Sevick RJ, Kanda F, Mintorovitch J, Arieff AI, Kucharczyk J, Tsuruda JS, Norman D, Moseley ME. Cytotoxic brain edema: assessment with diffusion-weighted MR imaging. Radiology 1992;185:687– 690. [17] Hasegawa Y, Latour LL, Sotak CH, Dardzinski BJ, Fisher M. Temperature dependent change of apparent diffusion coefficient of water. Journal of Blood Flow and Metabolism 1994;14:383–390. [18] Brockstedt S, Borg M, Geijer B, Wirestam R, Thomsen C, Holtås S, Ståhlberg F. Triggering in quantitative diffusion imaging with singleshot EPI. Acta Radiologica 1999;40:263–269. [19] Rana AK, Wardlaw JM, Bastin ME, Armitage PA, Keir SL. Diffusion imaging abnormalities correlate with the type and severity of ischaemic stroke [abstract]. Magnetic Resonance Materials in Physics, Biology and Medicine 2000;11(S1):228. [20] Warach S, Chien D, Li W, Ronthal M, Eldman RR. Fast magnetic resonance diffusion-weighted imaging of acute human stroke [published erratum appears in Neurology 1992;42:2192]. Neurology 1992;42:1717–1723. [21] Conturo TE, McKinstry RC, Aronovitz JA, Neil JJ. Diffusion MRI: precision, accuracy and flow effects [review]. NMR in Biomedicine 1995;8:307–332. [22] Murtz P, Flacke S, Traber F, Keller E, Gieske J, Folkers P, Schild HH. Diffusion-weighted MR tomography: navigated multi-shot SEEPI technique for clinical use [German]. Rofo Fortschritte auf dem Gebiete der Rontgenstrahlen und der Neuen Bildgebenden Verfahren 1998;168:580 – 8.

[23] Laubach HJ, Jakob PM, Loevblad KO, Baird AE, Bovo MP, Edelman RR, Warach S. A phantom for diffusion-weighted imaging of acute stroke. Journal of Magnetic Resonance Imaging 1998;8:1349 –1354. [24] Baird AE, Benfield A, Schlaug G, Siewert B, Lo¨ vblad K-O, Edelman RR, Warach S. Enlargement of human cerebral ischaemic lesion volumes measured by diffusion-weighted magnetic resonance imaging. Annals of Neurology 1997;41:581–589. [25] Lyden PD, Zweifler R, Mahdavi Z, Lonzo L. A rapid, reliable, and valid method for measuring infarct and brain compartment volumes from computed tomographic scans. Stroke 1994;25:2421–2428. [26] Rovartis M, Rocca MA, Sormani MP, Comi G, Filippi M. Reproducibility of brain MRI lesion volume measurements in multiple sclerosis using a local thresholding technique: effects of formal operator training. European Neurology 1999;41:226 –230. [27] Molyneux PD, Miller DH, Filippi M, Yousry TA, Radu EW, Ader HJ, Barkhof F. Visual analysis of serial T2-weighted MRI in multiple sclerosis: intra- and interobserver reproducibility. Neuroradiology 1999;41:882– 888. [28] Leon AC, Marzuk PM, Portera L. More reliable outcome measures can reduce sample size requirements. Archives of General Psychiatry 1995;52:867– 871. [29] Jiang Q, Zhang RL, Zhang ZG, Ewing JR, Divine GW, Chopp M. Diffusion-, T2, and perfusion-weighted nuclear magnetic resonance imaging of middle cerebral artery embolic stroke and recombinant tissue plasminogen activator intervention in the rat. Journal of Cerebral Blood Flow and Metabolism 1998;18:758 –767.