Ultrasound in Med. & Biol. Vol. 16, No. 7, pp. 653-657, 1990
0301-5629/90 $3.00+ .00 © 1990Pergamon Press plc
Printed in the U.S.A.
OOriginal Contribution INTER- AND INTRA-OBSERVER VARIABILITY OF DOPPLER PEAK VELOCITY MEASUREMENTS: AN IN-VITRO STUDY FRANKLIN N. TESSLER, CAROLYN KIMME-SMITH, M. LINDA SUTHERLAND, VICKI L. SCHILLER, RITA R. PERRELLA a n d EDWARD G. G R A N T Department of Radiological Sciences, UCLA School of Medicine, CHS BR-272, 10833 Le Conte Avenue, Los Angeles, CA 90024-1721 (Received 28 December 1989; in final form 6 May 1990)
Abstract--To determine the variability of pulsed Doppler peak velocity measurements, four radiologists with differing experience were tested using a calibrated flow phantom. Two ultrasound units, three probes and eight velocity rates varying between 40.5 and 78 cm/sec were studied, with a total of 303 measurements. The results were normalized against a set of 106 separate measurements made under highly-controlled conditions. The residual error standard deviation (not attributable to any systematically varied factor, including the velocity rate) was 6.8 cm/sec, with most of the remaining variation due to changing transducer or machine. Observer/equipment interactions accounted for 15.8% of the observed variability. The duration of the radiologist's Doppler experience had no significant effect.
Key Words: Doppler studies, Ultrasound, Observer variability, Velocity measurements.
and depth, turbulence (at velocities greater than 100 cm/sec), and blood viscosity (at velocities under 35 cm/sec), (2) variations attributable to equipment, such as transducer beam pattern and frequency spectrum, and (3) factors related to the examination technique, such as sample volume size and placement, angle correction, and transducer motion (Burns 1987; Gill 1985). This study was undertaken to evaluate the contributions of examiner and machine-related factors to the variability of Doppler flow measurements. Specifically, we assessed the consistency of single observers over time, variations between observers of differing experience, and the effects of changing equipment on measurement variability.
INTRODUCTION
Doppler ultrasound uses a spectrum of frequency shifts to represent blood flow within a region of interest. These frequency shifts fall within the audible range, and valuable information may be gained by simply listening to the frequency mix. Typically, however, the Doppler shifts are portrayed visually as a graph of frequency shift vs. time. Although the spectral wave forms may be analyzed qualitatively, peak velocity measurements are required for quantitation of blood flow dynamics. Estimates of velocity as determined by Doppler ultrasound are currently being used to diagnose pathological conditions in a number of anatomic sites, including the carotid arteries, the renal arteries, and the heart (Grant et al. 1989; Moriyasu et al. 1986; Zoli et al. 1986). To be clinically valid, Doppler flow measurements must be reproducible. Earlier investigations have attempted to determine the variability of Doppler flow estimates, with reported variations between 10 and 26% (Moriyasu et al. 1986; Zoli et al. 1986). Sources of error in peak velocity measurements fall into three broad categories: (1) factors intrinsic to the region being studied, such as vessel size
MATERIALS
AND METHODS
A Doppler flow phantom (Echo Ultrasound, Lewistown, PA) containing a blood-mimicing combination of water, glycerine and a mixture of 10 and 30 micron microspheres was used to simulate two pulsatile (arterial) and four continuous (venous) flow conditions. The plastic tube carrying the fluid had an inner diameter of 0.6 cm. It was encased in a block of tissue-mimicing material that attenuated the ultrasound beam at a rate of 0.5 dB/cm/MHz. The tube
A d d r e s s all c o r r e s p o n d e n c e to Dr. F r a n k l i n N. Tessler.
653
654
Ultrasound in Medicine and Biology
slanted downward through the block at a 26 degree angle, passing from a depth of 3 cm to a depth of 12 cm. For each observation, the peak velocity was set by a physicist (CKS) using an inline flow meter. The meter consisted of a measuring column containing blood-mimicing fluid with a mercury bubble resting on top of it. The manufacturer of the Echo Ultrasound phantom had provided separate calibration curves for the system operating with water and blood-mimicing fluid. This data, in mL/sec, was converted to cm/sec by assuming laminar flow and dividing by the inner cross-sectional area of the tube and multiplying by 2 to approximate peak velocity. The manufacturer's calibration was checked against an open flow phantom which had been calibrated by the timed collection method (McDicken 1986). Cross checking was accomplished by setting a flow rate on the open phantom and scanning it with a 2.0 MHz continuous-wave (CW) Doppler probe (U1tramark 8, Advanced Technology Laboratories, Bothell, WA) and measuring the peak velocity. Next, the same peak velocity reading was obtained on the same ultrasound unit using the Echo Ultrasound phantom. Finally, the value on the Echo Ultrasound phantom's flow meter was compared to the velocity calculated from the calibrated flow rate on the second phantom. This procedure was repeated for 10 peak velocities ranging from 20 to 90 cm/sec. Because variations in precision between the two phantoms at low peak velocities prevented verification of the flow meter, velocities less than 40.5 cm/sec were not used in this experiment. In addition, because the flow phantom motor produced air bubbles at velocities of 80 cm/sec or greater, values in this range were also excluded. Once set, the peak velocity was measured by one of four radiologists using commercially available pulsed Doppler ultrasound equipment (Ultramark 4 and Ultramark 8, Advanced Technology Laboratories, Bothell, WA). Three mechanical sector transducers, two operating at 5.0 MHz and one at 3.5 MHz, were used. The radiologists were instructed to measure the peak velocity as they would in clinical practice (Fig. 1). The radiologists could not see the flow meter, and were free to adjust gain, output power, wall filter, Doppler angle and sample volume size and position as they wished. Sample volume size ranged from 2 to 5 mm. Angle corrections were selected by the radiologists and ranged between 44 ° and 74 ° (only 2 measurements were made at 74°). Peak velocities ranged from 40 to 78 cm/sec. The measurements made with
Volume 16, Number 7, 1990
the 5.0 MHz transducer were typically from sample volumes at depths of 3 to 4 cm, whereas the measurements at 3.5 MHz were made using sample volumes located 4 to 6 cm from the transducer. Of the four radiologists who participated in the study, two (FNT, EGG) had several years of pulsed Doppler experience, one (MLS) was an ultrasound fellow with six months experience, and one (VLS) a resident with one month experience. Each radiologist was tested four times, with 18 or 19 different flow conditions randomly presented during each session. Every peak velocity was measured at least twice by each observer using both ultrasound units and transducer frequencies. One peak velocity measurement was made for each of the four different continuous flow conditions, while the two pulsatile flow conditions each contributed two measurements (one peak and one trough value). A total of 303 measurements were made; their distribution is shown in Table 1. A separate controlled experiment was performed to quantify variations likely to be related to the test object itself, such as errors in setting the peak velocity. Therefore, 106 additional peak velocity measurements made by the physicist under tightly controlled conditions were used to establish a "gold standard" against which the four observers' results could be compared. For each transducer frequency, the depth, sample volume size, scanning angle, wall filter, gain, and output which would yield the maximum velocity when measured on the flow phantom were determined. The eight peak velocities were then set and measured without changing the scan parameters for the frequency being used. Each set of measurements was performed two or three times, with the peak velocities reset for each sequence. Except for the second 5.0 MHz probe, each frequency was tested during a single session. The Statistical Analysis System (SAS) package, with the PROCGLM analysis of variance (ANOVA) program and the PROC VARcomp variance components program were used to evaluate the 409 (303 + 106) data points and determine the contributions of observer, ultrasound unit, transducer and flow type (continuous vs. pulsatile) to measurement variability. The component of variance analysis was carried out using the method of restricted maximum likelihood. In addition, although the variation for each individual peak velocity was determined initially, we later elected to combine the data for all the peak velocities to increase the number of repeated observations for the other variables. Because there were no interactions with peak velocity, the "gold
Inter- and intra-observer variability • F. N. TESSLERel al.
655
(a)
(b) Fig. 1. Measurements of peak velocity (the values shown are for illustration only): (a) Continuous flow wave form at a peak velocity of 31 cm/sec. (b) Pulsatile wave form at a peak velocity of 51/25 cm/sec. standard" mean could be subtracted from each observer's mean for a particular peak velocity. This allowed us to evaluate the other variables independent o f the peak velocity.
RESULTS Variance which could not be attributed to differences in equipment, flow type or observer (that is, the
656
Ultrasound in Medicine and Biology
"noise" of the experiment) was found to have a standard deviation of 6.8 cm/sec by the ANOVA. The other major components of variance, obtained from univariate repeated measures, are shown in Table 2. Effects due to equipment far exceeded observer-related variations. A large component of the variance due to the transducer was secondary to the superior performance of three of the observers when using the second 5.0 MHz probe, which was much more sensitive than the first. One observer performed better using the 3.5 MHz transducer. The error was less for the Ultramark 4 ultrasound unit (0.33 vs. 5.8 cm/sec) for all observers except one. Pulsatile flows contributed more to the error than continuous rates, with errors of 5.9 and 2.4 cm/ sec respectively, and high peak velocities were predominate among those error rates which were more than two standard deviations (13.4 cm/sec) above the noise level of the experiment (Table 3). The performance of the four observers is summarized in Table 4. While the two radiologists with the most experience exhibited lower error rates than
Volume 16, Number 7, 1990 Table 2. C o n t r i b u t i o n to variability o f D o p p l e r velocity m e a s u r e m e n t s by 5 different variables from the c o m p o n e n t s o f variance analysis. All data n o r m a l i z e d against "gold s t a n d a r d " m e a n s for each peak velocity.
Factor Ultrasound unit used Transducer Observer/transducer interaction Observer/ultrasound unit interaction Flow type Baseline variation§
Variance component (cm/sec)2
SD'~ (cm/sec)
Percentqt
15.4 44.2
3.9 6.6
11.9 34.1
15.2
3.9
11.7
5.3 3.6 46.0
2.3 1.9 6.8
4.1 2.8 35.4
t Square root of variance component. :~ Percent of the sum of all of the variance components (very small components are not included in this table). § Baseline variation is the variation which cannot be attributed to differences in equipment, flow type, or observer.
either the fellow or the resident, the differences were not statistically significant. DISCUSSION
Table 1. Distribution o f flow m e a s u r e m e n t s by peak velocity, flow type, observer, ultrasound unit, a n d transducer. Peak velocity (cm/sec)
Flow type
Samples
40.5 48 61 75.3 44/52 66/78
C C C C P P
36 49 30 44 37 X 2 = 74 35 × 2 = 70
C = continuous flow. P = pulsatile flow. Observer
Experience
Samples
l 2 3 4
6 years 2 years 6 months 1 month
72 75 80 76
Ultrasound unit
Samples
ATL Ultramark 4 ATL Ultramark 8
134 169
Transducer
Samples
3.5 MHz 5.0 MHz #1 5.0 MHz #2
167 59 77
Recently, Doppler sonography has come to assume a prominent role in the diagnosis of pathology affecting many areas, including the carotid arteries, the heart, the renal vessels, the abdominal vasculature and the maternal-fetal circulation (Grant et al. 1989). Although qualitative characterization of flow disturbance (determination of flow direction and detection of turbulence) remains important, quantitative measurement of peak velocity is becoming increasingly significant. Clearly, blood velocity estimates can be useful clinically only if they are accurate and repeatable. That is, given constant flow conditions, repeated measurements must vary by less than some arbitrarily acceptable limit. In a study of portal venous Table 3. Error rates (means) m o r e t h a n 2 s t a n d a r d deviations above the noise level o f the experiment. Peak velocity (cm/sec)
Flow type
75.3 78 66 52 75.3
C P P P C
Ultrasound unit Ultramark Ultramark Ultramark Ultramark Ultramark
4 8 8 8 4
Transducer 3.5 3.5 3.5 3.5 5.0
MHz MHz MHz MHz MHz
Errort (cm/sec) 18.6 16.3 15.8 13.6 13.5
C = continuous flow. P - pulsatile flow. t Error consists of the mean of differences between the gold standard and the radiologists' measurements. N, the number of observations, varied from 8 to 12.
Inter- and intra-observer variabifity • F. N. TESSLER et aL
Table 4. Overall observer bias. Observer
Experience
"Bias"(cm/sec)t
1 2 3 4
6 years 2 years 6 months 1 month
5.2 4.9 9.1 6.8
t "Bias" is the square root of the average of the squared difference between the gold standard flow and the observer's flow in the 34 observations where both were available for each instrument, flow type, peak velocity, and probe frequency.
blood velocity measurement using pulsed Doppler ultrasound, Zoli et al. (1986) found interobserver variations of under 10%. However, they did not independently verify constancy of flow, but instead used the measurements themselves to conclude that portal velocity does not change during expiration. In our study, we eliminated the effect of in-vivo velocity variation by performing all our measurements on a calibrated phantom. Furthermore, we attempted to exclude errors related to the test object itself and to the peak velocity set on the phantom by normalizing the four observers' measurements against a separate set of highly-controlled "gold standard" observations. In our experiment, the largest component of variation (SD = 6.8 cm/sec) could not be attributed to any of the systematically varied factors, including the velocity rate. This suggests that, for the range of peak velocities studied, measurements less than this value fell within the "baseline noise" of the experiment. The next largest sources of measurement variability were due to the transducer (SD = 6.6 cm/sec, 34.1%) or ultrasound unit (SD = 3.9 cm/sec, 11.9%). Although serial Doppler flow studies should ideally be performed using a single unit-probe combination, this is often impractical in a clinical setting. Although minor, measurement variability was greatest at the higher velocity and pulsatile flow conditions most likely to be encountered in vivo.
657
Only a small portion of the overall variability (15.8%) was found to be observer-related, and this effect could not be separated from interactions with the equipment. We did not attempt to study systematically the effect of various Doppler parameters (gain, output power, wall filter, Doppler angle and sample volume size and position) on measurement variability. Rather, because we wished to evaluate the role of operator experience, the observers were free to adjust the controls as they saw fit. In our in-vitro experiment, observer-related variability was small. We recognize that in clinical practice, Doppler measurements are more likely to depend on the operator's experience. This is especially true when dealing with vessels such as the renal arteries, where simply finding the vessel of interest can be time consuming and arduous. Nevertheless, our results suggest that if a vessel can be successfully located and sampled, Doppler flow measurements are repeatable. Acknowledgments--We wish to thank Jeffrey Gornbein for performing the statistical analysis.
REFERENCES Burns, P. N. The physical principles of Doppler and spectral analysis. J. Clin. Ultrasound 15:567-590; 1987. Gill, R. W. Measurement of blood flow by ultrasound: Accuracy and sources of error. Ultrasound Med. Biol. 11:625-641; 1985. Grant, E. G.; Tessler, F. N.; Perrella, R. R. Clinical Doppler imaging. AJR 152:707-717; 1989. McDicken, W. A versatile test object for the calibration of ultrasonic Doppler flow instruments. Ultrasound Med. Biol. 12:245-249; 1986. Moriyasu, F.; Ban, N.; Nishida, O.; Nakamura, T.; Miyake, T. Clinical application of an ultrasonic duplex system in the quantitative measurement of portal blood flow. J. Clin. Ultrasound 14:579-588; 1986. Zoli, M.; Marchesini, G.; Cordiani, M. R.; Pisi, P.; Brunori, A. Echo-Doppler measurement of splanchnic blood flow in control and cirrhotic subjects. J. Clin. Ultrasound 14:429-435; 1986.