Interpretive reproducibility of stress Tc-99m sestamibi tomographic myocardial perfusion imaging

Interpretive reproducibility of stress Tc-99m sestamibi tomographic myocardial perfusion imaging

Interpretive reproducibility of stress Tc-99m sestamibi tomographic myocardial perfusion imaging Robert J. Golub, MD, Alan W. Ahlberg, MA, Joseph 1~ M...

767KB Sizes 0 Downloads 15 Views

Interpretive reproducibility of stress Tc-99m sestamibi tomographic myocardial perfusion imaging Robert J. Golub, MD, Alan W. Ahlberg, MA, Joseph 1~ McClellan, MD, FACC, Steven D. Herman, MD, FACC, Mark I. Travin, MD, FACC, Jeffrey F. Mather, MS, Percy W. Aitken, MD, FACC, John I. Baron, MD, FACC, and Gary V. Heller, MD, PhD, FACC

Background: Observer variability has been shown with interpretation of planar thallium201 images. The interpretive reproducibility of technetium-99m sestamibi tomographic imaging is unknown. This study evaluated the interpretive reproducibility of interpretable Tc-99m sestamibi tomographic images among nuclear cardiologists with a wide range of training and experience. Methods: Three experienced readers (EX) and 3 less-experienced readers (LEX) interpreted 138 exercise and rest Tc-99m sestamibi tomographic images (101 were abnormal in patients with coronary artery disease [CAD], 37 were normal in patients with <5% likelihood of CAD) twice in random sequence without clinical data. Images of good to excellent quality were randomly selected from a database at 2 nuclear cardiology laboratories. Intraobserver and interobserver agreement for global, left anterior descending (LAD) territory, non-LAD first (normal/abnormal) and second (normal/fixed/reversible) order, and defect extent (normal/single-vessel CAD/multi-vessel CAD) were assessed with percent agreement and Cohen's kappa (~:) statistic. Results: With regard to intraobserver agreement, first and second order ranged from 87% to 94% and 80% to 90% for global, 82% to 96% and 78% to 95% for LAD, and 88% to 91% and 80% to 90% for non-LAD, respectively. Defect extent ranged from 75% to 90%. There were no differences between EX and LEX for global and non-LAD first and second order, LAD first order, and defect extent. LAD second order was 93% for EX compared with 88% (P = .015) for LEX. With regard to interobserver agreement, first and second order ranged from 73 % to 89 % and 64% to 85% for global, 73% to 93% and 69% to 91% for LAD, and 76% to 88% and 68% to 84% for non-LAD, respectively. Defect extent ranged from 61% to 82 %. Global first and sec 1 ond order ranged from 85 % to 87 % and 78 % to 82 % for EX compared with 73 % to 84 % and 64% to 79% for LEX. LAD first and second order ranged from 89% to 91% and 88% to 89% for EX compared with 73% to 91% and 69% to 70% for LEX. Non-LAD first and second order ranged from 82% to 86% and 76% to 77% for EX compared with 76% to 86% and 68% to 81% for LEX. Defect extent ranged from 69% to 75% for EX compared with 59% to 77% for LEX. Conclusions: There is moderate to excellent interpretive reproducibility with stress Tc-99m sestamibi SPECT imaging among nuclear cardiologists with a wide range of training and experience. (J Nucl Cardiol 1999;6:257-69.) Key Words: tomographic imaging • interpretive reproducibility ° intraobserver agreement ° interobserver agreement

The utility and cost-effectiveness of an imaging procedure depends on accurate and reproducible interpretation. Stress myocardial perfusion imaging has a high overall accuracy in the diagnosis of coronary artery disease

(CAD), particularly when images are interpreted by a consensus of experienced nuclear cardiologists. 1-5 Evaluation of interpretive variability by use of myocardial perfusion imaging has been primarily conducted in the

From the Division of Cardiology, Roger Williams Medical Center, Brown University School of Medicine, Providence; the Division of Cardiology, Memorial Hospital of Rhode Island, Pawtucket, RI; the Division of Cardiology, Hospital of the University of Pennsylvania, Philadelphia, Pa, the Nuclear Cardiology Laboratory, Division of Cardiology, Hartford Hospital, Hartford, and the Departments of Medicine and Nuclear Medicine, University of Connecticut School of Medicine, Farmington, Conn.

Presented in part at the 44th annual scientific sessions of the American College of Cardiology, New Orleans, La, November 10-14, 1995 Received for publication Oct 27, 1997; revision accepted June 17, 1998. Reprint requests: Gary V. Heller, MD, PhD, FACC, Nuclear Cardiology Laboratory, Division of Cardiology, Hartford Hospital, 80 Seymour St, PO Box 5037, Hartford, CT 06102-5037; [email protected]. Copyright © 1999 by the American Society of Nuclear Cardiology. 1071-3581/99/$8.00 + 0 43/1/93808 257

258

Golub et al Myocardial perfusion imaging with Tc-99m sestamibi

Journal of Nuclear Cardiology May/June 1999

Table 1, A. Individual and c o m b i n e d g r o u p intraobserver a g r e e m e n t for global first-order interpretation of exercise and rest Tc-99m sestamibi SPECT myocardial perfusion images

Second read

Individual EX 1 first read EX 2 first read EX 3 first read LEX 4 first read LEX 5 first read LEX 6 first read Combined group EX first read LEX first read All readers first read

N!

Abl

NI Abl NI Abl NI Abl NI Abl NI Abl NI Abl

45 8 43 4 46 2 30 8 48 2 59 5

8 77 6 85 6 84 10 90 11 77 7 67

NI Abl NI Abl NI Abl

134 14 137 15 271 29

20 246 28 234 48 480

%

88

0.755

93

0.84

94

0.875

87

0.678

91

0.804

91

0.826

92

0.823

90

0.780

91

0.801

First order, Normal versus abnormal image; NI, normal image; Abl, abnormal image; %, percent agreement; to, kappa statistic; EX, experienced reader(s); LEX,less-experienced reader(s).

earlier years when planar imaging with thallium-201 was the method of choice. These studies demonstrated considerable variability in the interpretation of these images. 6-9 However, since these studies were performed, most laboratories have adopted single photon emission computed tomographic (SPECT) imaging rather than planar, and technetium-99m radiopharmaceuticals rather than T1-201 has been popularized. The use of SPECT may in fact increase the variability of interpretation, whereas technetiumbased agents may be expected to reduce the same. In spite of widespread use of both SPECT imaging and Tc-99m agents, interpretive reproducibility of these new developments has not been assessed. Recently, various organizations have adapted minimal guidelines for training in nuclear cardiology, many of which use the newer methods for myocardial perfusion imaging, giving added importance to this topic. 1° Therefore the purpose of this study was to evaluate the interpretive reproducibility of stress and rest Tc-99m sestamibi SPECT among physicians with a wide range of nuclear cardiology training and experience. To assess the skills of the readers, rather than laboratory expertise in image acquisition, images chosen for this study were of interpretable

quality from a range of patient conditions, including normal, as well as 1-, 2-, and 3-vessel CAD.

METHODS Study Design Six nuclear cardiologists (3 highly experienced, 3 lessexperienced) interpreted 138 exercise and rest Tc-99m sestamibi SPECT images that were presented in random sequence without patient name, clinical information, or exercise data. Each image was interpreted twice in random order for normal, fixed, or reversible defects by vascular territory. Agreement for individual and combined group image interpretations were calculated and compared between individuals and groups (experienced versus less-experienced).

Participating Cardiologists Cardiologists who participated in this study had a wide range of training and experience. All physicians completed a 200-hour didactic course in radiation physics and had at least 6 months of training in nuclear cardiology and fulfilled Society of Nuclear Medicine/American

Journal of Nuclear Cardiology Volume 6, Number 3;257-69

Golub et al Myocardial perfusion imaging with Tc-99m sestamibi

259

T a b l e 1, B° Individual a n d c o m b i n e d g r o u p i n t r a o b s e r v e r a g r e e m e n t for global s e c o n d o r d e r i n t e r p r e t a t i o n of exercise a n d rest T c - 9 9 m s e s t a m i b i SPECT myocardial perfusion i m a g e s

S e c o n d read NI Individual EX 1 first read NI Fix Rev EX 2 first read NI Fix Rev EX 3 first read NI Fix Rev LEX 4 first read NI Fix Rev LEX 5 first read NI Fix Rev LEX 6 first read NI Fix Rev Combined group EX first read NI Fix Rev LEX first read NI Fix Rev All readers first read NI Fix Rev

Fix

Rev

%

~:

45 1 7

1 7 3

8 5 61

82

0.680

45 2 1

1 12 2

4 5 66

89

0.815

46 1 1

2 8 2

4 4 70

90

0.819

23 2 5

3 19 4

6 8 68

80

0.649

49 0 2

1 4 4

10 3 65

86

0.734

59 0 4

2 7 3

4 2 57

89

0.808

136 4 9

4 27 7

16 i4 197

87

0.772

1[31 2 11

6 30 11

20 13 190

85

0.738

267 6 20

10 57 18

36 27 387

86

0.755

Second order, No defect versus fixed defect versus reversible defect; All, no defect; Fix, fixed defect; Rev, reversible defect; %, percent agreement; ~c, kappa statistic; EX, experienced reader(s); LEX, less-experienced reader(s).

College of Cardiology/American Society of Nuclear Cardiology criteria for specialized training, 10Thre e cardiologists had an additional year of specialized training in nuclear cardiology beyond fellowship and 2, 6, and 12

years of practical experience performing nuclear cardiology procedures. These readers had all interpreted at least 2000 images with a mean of >4000 images per reader and were designated as experienced readers.

260

Golub et al Myocardial perfusion imaging with Tc-99m sestamibi

Journal of Nuclear Cardiology May/June 1999

Table 2, A. Individual and combined g r o u p intra-observer a g r e e m e n t for LAD first-order interpretation of exercise and rest Tc-99m sestamibi SPECT myocardial perfusion images

Second

Individual EX I first read EX 2 first read EX 3 first read LEX 4 first read LEX 5 first read LEX 6 first read Combined group EX first read LEX first read All readers first read

read

NI

Abi

NI Abl NI Abl NI Abl NI Abl NI Abl NI Abl

90 7 95 3 85 2 60 10 93 6 96 4

6 35 3 37 4 47 15 53 4 35 1 37

N1 Abl NI Abl NI Abl

270 12 249 20 519 32

13 119 20 125 33 244

%

K

91

0.776

96

0.894

96

0.906

82

0.637

93

0.824

96

0.911

94

0.861

90

0.788

92

0.824

First order, Normal versus abnormal image; NI, normal image; Abl, abnormal image; %, percent agreement; to, kappa statistic; EX, experienced reader(s); LEX,less-experienced reader(s).

Case Selection One hundred one patients with abnormal stress images and evidence of CAD by subsequent cardiac catheterization were randomly selected from a database of patients referred for exercise stress testing with Tc-99m sestamibi SPECT imaging at Roger Williams Medical Center or at the Memorial Hospital of Rhode Island. Twenty-eight patients had small, 44 had medium, and 30 had large exercise-induced perfusion defects. Seventythree patients had defects suggestive of single-vessel CAD, and 28 had defects suggestive of multivessel CAD. Also included in this study were 37 patients with normal images and <5% likelihood of CAD. One hundred sixteen (84%) patients were men, and 22 (16%) were women.

Exercise Protocol All patients performed treadmill exercise adhering to the standard Bruce protocol. Exercise termination end points included fatigue, progressive angina, systolic hypotension, diastolic hypertension, ST-segment depres-

sion of 0.4 mV, sustained supraventricular arrhythmias, or complex ventricular arrhythmias.11 Tc-99m sestamibi was injected at least 1 minute before termination of exercise.

SPECT Acquisition Stress and rest images were acquired on separate days. SPECT acquisition began 60 minutes after injection of Tc-99m sestamibi (0.2 mCi/kg) with an ADAC ARC 4000 rotating-head gamma camera. Sixty-four projections (20 seconds for stress and 25 seconds for rest) were obtained over a 180-degree semicircular arc extending from the 45-degree right anterior oblique to the 45degree left posterior oblique position. All projection images were stored on a 64 x 64 x 16 byte matrix. Filtered back-projection was performed by use of a low-resolution Butterworth filter with a frequency cutoff of 0.5 cycles/pixel, on an order of 10 for reconstruction of the transaxial slices to a thickness of 6.6 mm. No preprocessing fitrafion or attenuation correction was used. All images were processed by the same 2 experienced technologists.

Journal of Nuclear Cardiology Volume 6, Number 3;257-69

Golub et al Myocardial perfusion imaging with Tc-99m sestamibi

261

Table 2, B. Individual a n d c o m b i n e d g r o u p i n t r a o b s e r v e r a g r e e m e n t for LAD s e c o n d o r d e r i n t e r p r e t a t i o n of exercise a n d rest T c - 9 9 m s e s t a m i b i SPECT m y o c a r d i a l perfusion i m a g e s

S e c o n d read N! Individual EX 1 first read NI Fix Rev EX 2 first read NI Fix Rev EX 3 first read NI Fix Rev LEX 4 first read NI Fix Rev LEX 5 first read NI Fix Rev LEX 6 first read NI Fix Rev Combined group EX first read NI Fix Rev LEX first read NI Fix Rev All readers first read N1 Fix Rev

Fix

Rev

%

90 0 7

1 5 0

5 1 29

90

0.771

95 0 3

0 6 0

3 1 30

95

0.883

85 1 1

2 8 1

2 0 38

95

0.899

57 5 5

5 10 2

10 3 41

78

0.634

92 2 4

0 3 1

5 1 30

91

0.783

96 0 3

0 4 2

1 2 30

94

0.866

270 1 11

3 19 1

10 2 97

93

0.854

245 7 1Z

5 17 5

16 6 101

88*

0.754*

515 8 23

8 36 6

26 8 198

90

0.802

Second order, No defect versus fixed defect versus reversible defect; NI, no defect; Fix, fixed defect; Rev, reversible defect; %, percent agreement; ~;, kappa statistic; FiX, experienced reader(s); L£X, less-experienced reader(s). *P < .05, less-experienced compared with experienced readers as a group.

Image Presentation Each image was presented in random sequence in the same manner to all readers. No information regarding patient identity (age, gender, weight), clinical history, or

exercise data was available. The cine display of planar images acquired for SPECT reconstruction was available for each image. All stress and rest short-axis, horizontal long-axis, and vertical long-axis slices were displayed

262

Golub et al Myocardial perfusion imaging with Tc-99m sestamibi

Journal of Nuclear Cardiology May/June 1999

Table 3, A. Individual a n d c o m b i n e d g r o u p i n t r a o b s e r v e r a g r e e m e n t for n o n - L A D first-order i n t e r p r e t a t i o n of e x e r c i s e a n d r e s t T c - 9 9 m s e s t a m i b i SPECT m y o c a r d i a l p e r f u s i o n i m a g e s

S e c o n d read

Individual EX 1 first read NI Abl EX 2 first read NI Abl EX 3 first read NI Abl LEX 4 first read NI Abl LEX 5 first read NI Abl LEX 6 first read NI Abl Combined group EX first read NI Abl LEX first read NI Abl All readers first read NI Abl

N!

Abl

%

K

65 6

10 57

88

0.768

65 4

8 61

91

0.826

58 2

7 71

93

0.869

50 5

10 73

89

0.777

66 4

10 58

90

0.797

70 7

8 53

89

0.779

188 12

25 189

91

0.821

186 16

28 184

89

0.788

374 28

53 373

90

0.805

First order, Normal versus abnormal image; NI, normal image; Abl, abnormal image; %, percent agreement; t¢, kappa statistic; EX, experi-

enced reader(s); LEX, less-experienced reader(s).

simultaneously b y use o f a commercially available computer system (ADAC, Sun Microsystems). Qualitative polar ("bull's-eye") representations from stress and rest data were available. Images could be viewed by use o f gray scale or a variety o f color tables and intensities preferred b y the individual reader.

Image Interpretation I m a g e interpretation was p e r f o r m e d in separate, time-limited sessions to avoid fatigue. Each image was interpreted independently b y each reader in a r a n d o m sequence different from that o f other readers. I m a g e s were interpreted as g l o b a l l y n o r m a l or abnormal. If abnormal, each defect was assigned a coronary vascular

territory ( m a x i m u m o f 3 vascular territories) and assessed as either fixed or reversible.

Data Analysis I m a g e s were classified as n o r m a l or a b n o r m a l for g l o b a l and r e g i o n a l i n t e r p r e t a t i o n . A b n o r m a l i t i e s a s s i g n e d to the right and c i r c u m f l e x territories were c o m b i n e d and classified as "non-left anterior descending" ( n o n - L A D ) territory for c o m p a r i s o n with the left anterior d e s c e n d i n g ( L A D ) territory. Qualitative firsto r d e r a g r e e m e n t for g l o b a l and r e g i o n a l c o m p a r i s o n s was d e t e r m i n e d by use o f 2 x 2 tables with n o r m a l versus a b n o r m a l as o u t c o m e categories. A b n o r m a l i m a g e s w e r e f u r t h e r c l a s s i f i e d as f i x e d or r e v e r s i b l e .

Journal of Nuclear Cardiology Volume 6, Number 3;257-69

Golub et al Myocardial perfusion imaging with Tc-99m sestamibi

263

Table 3, B. Individual a n d c o m b i n e d g r o u p i n t r a o b s e r v e r a g r e e m e n t for n o n - L A D s e c o n d - o r d e r i n t e r p r e t a t i o n of e x e r c i s e a n d rest T c - 9 9 m s e s t a m i b i SPECT m y o c a r d i a l perfusion i m a g e s

S e c o n d read

Individual EX 1 first read NI Fix Rev EX 2 first read NI Fix Rev EX 3 first read NI Fix Rev LEX 4 first read NI Fix Rev LEX 5 first read NI Fix Rev LEX 6 first read NI Fix Rev Combined group EX first read NI Fix Rev LEX first read NI Fix Rev All readers first read N1 Fix Rev

N!

Fix

Rev

%

65 1 5

2 3 5

8 3 46

83

0.683

65 2 2

1 8 3

7 4 46

86

0.760

58 0 2

2 8 2

5 3 58

90

0.823

44 0 4

7 14 3

5 9 52

80

0.676

66 l 3

1 4 4

9 1 49

86

0.747

70 2 5

3 6 1

5 1 45

88

0.773

188 3 9

5 19 10

20 10 150

86

0.757

180 3 12

11 24 8

19 11 146

85

0.734

368 6 21

16 43 18

39 21 296

85

01746

Second order, No defect versus fixed defect versus reversible defect; NI, no defect; Fix, fixed defect; Rev, reversible defect; %, percent agreement; ~c,kappa statistic; EX, experienced reader(s); LEX, less-experienced reader(s).

Q u a l i t a t i v e s e c o n d - o r d e r a g r e e m e n t for g l o b a l a n d regional comparisons was determined by use of 3 x 3 tables with no defect, fixed defect, or reversible defect as outcome variables. Each image was also classified as normal, single-

vessel CAD, or multivessel CAD for global interpretation. Qualitative extent of CAD analyses were performed by use of 3 x 3 tables with normal, single-vessel CAD, and multivessel CAD as variables. Reproducibility was assessed by use of percent agreement and Cohen's kappa

264

Golub et al Myocardial'perfusion imaging with Tc-99m sestamibi

Journal of Nuclear Cardiology May/June 1999

T a b l e 4. Individual a n d c o m b i n e d g r o u p i n t r a - o b s e r v e r a g r e e m e n t for defect e x t e n t in the i n t e r p r e t a t i o n of exercise a n d rest T c - 9 9 m s e s t a m i b i SPECT m y o c a r d i a l perfusion i m a g e s

Second read

Individual EX 1 first read NL SVD MVD EX 2 first read NL SVD MVD EX 3 first read NL SVD MVD LEX 4 first read NL SVD MVD LEX 5 first read NL SVD MVD LEX 6 first read NL SVD MVD Combined group EX first read NL SVD MVD LEX first read NL SVD MVD All readers first read NL SVD MVD

NL

SVD

MVD

%

45 8 0

6 43 8

3 7 18

77

0.636

45 3 0

5 61 2

0 4 18

90

0.833

46 2 0

5 34 3

1 6 41

88

0.815

32 6 2

8 29 4

2 13 42

75

0.618

52 1 0

8 40 10

1 4 22

83

0.732

59 4 1

6 33 6

1 1 27

86

0.784

136 13 0

16 138 13

4 17 77

85

0.766

143 11 3

22 102 20

4 18 91

81

0.715

279 24 3

38 240 33

8 35 168

83

0.741

EX, Experienced reader(s); LEX, less-experienced reader(s); %, percent agreement; tc, kappa statistic;'NL, normal perfusion; SVD, singlevessel CAD; A4VD, multivessel CAD.

(•) statistic. 12 For each reader, intraobserver reproducibility was measured by comparing the first and second interpretation of images from each patient. 13 Individual results were combined by group, and results were compared b e t w e e n highly experienced and lessexperienced readers. Kappa values were classified as follows: <0.20 = poor agreement, 0.21 to 0.40 = fair agree-

ment, 0.41 to 0.60 = moderate agreement, 0.61 to 0.80 = good agreement, and 0.81 to 1.00 = excellent agreement. Clinically important agreement was defined a priori as a kappa value > 0.50.14 Comparisons between kappa values were made by use of chi-square statistic as described by Fleiss. 14 The criteria for statistical significance was predetermined at P < .05.

Journal of Nuclear Cardiology Volume 6, Number 3;257-69

Golub et al Myocardial perfusion imaging with Tc-99m sestamibi

T a b l e 5. I n t e r o b s e r v e r a g r e e m e n t

for g l o b a l f i r s t - a n d s e c o n d - o r d e r

265

i n t e r p r e t a t i o n o f e x e r c i s e a n d r e s t Tc-

9 9 m s e s t a m i b i SPECT m y o c a r d i a l p e r f u s i o n i m a g e s

EX 1

EX 2

EX 3

LEX 4

LEX 5

LEX 6

% 0<)

% 0<)

% 0<)

% 0<)

% (~)

% 0<)

Global--first o r d e r EX 1 EX 2

~ w

8 6 (0.688) --

8 5 (0.675) 8 7 (0.715)

79 (0.533) 7 8 (0.482)

8 6 (0.703) 8 9 (0.775)

8 6 (0.706) 8 2 (0.631 )

EX 3

--

--

~

8 3 (0.605)

8 6 (0.700)

8 8 (0.750)

LEX 4 LEX 5

. .

79 (0.529)

73 (0.450) 8 4 (0.685)

78 82 79 68

81 77 85 64

LEX 6 Giobal~2nd order EX 1

.

. .

.

.

~ ~ .

LEX 5 LEX 6

. .

. .

--

EX 2 EX 3 LEX 4

. .

78 (0.617) --.

. .

.

.

.

7 9 (0.634) 8 2 (0.678) -.

. .

.

. .

6 8 (0.462) 70 (0.501) 72 (0.530)

(0.602) (0.675) (0.620) (0.457)

. .

(0.664) (0.610) (0.735) (0.411 )

79 (0.632) .

.

First order, Normal versus abnormal image; second order, no defect versus fixed defect versus reversible defect; EX, experienced reader; LEX, less-experienced reader; %, percent agreement; to, kappa statistic.

Table 6. l n t e r o b s e r v e r a g r e e m e n t

for LAD f i r s t - a n d s e c o n d - o r d e r s e s t a m i b i SPECT m y o c a r d i a l p e r f u s i o n i m a g e s

interpretation of exercise and rest Tc-99m

EX 1

EX 2

EX 3

LEX 4

LEX 5

LEX 6

% (K:)

% (~:)

% (~)

% (K:)

% (~)

% (~)

LAD--first o r d e r EX 1

J

91 (0.782)

9 0 (0.780)

77 (0.527)

91 (0.782)

91 (0.773)

EX 2

--

--

8 9 (0.754)

73 (0.445)

9 3 (0.842)

8 9 (0.744)

EX 3 LEX 4

J .

--

--

76 (0.522)

8 8 (0.738) 73 (0.445)

9 0 (0.778) 7 4 (0.467)

LEX 5 LEX 6 LAD---second o r d e r EX 1 EX 2 EX 3 LEX 4 LEX 5 LEX 6

.

. . w ~ ~ . . .

. .

. .

.

. .

8 9 (0.761) --. .

. . .

.

. .

8 9 (0.764) 8 8 (0.755) -. . .

91 (0.779) .

.

7 2 (0.490) 7 0 (0.457) 73 (0.517)

87 91 86 69

. .

(0.711) (0.791) (0.693) (0.434)

89 88 89 70 88

(0.744) (0.724) (0.778) (0.449) (0.716)

.

First order, Normal versus abnormal image; second order, no defect versus fixed defect versus reversible defect; EX, experienced reader; LEg, less-experienced reader; %, percent agreement; lc, kappa statistic.

RESULTS intraobserver A g r e e m e n t Good to excellent intraobserver agreement was f o u n d a m o n g all r e a d e r s f o r g l o b a l first ( 8 7 % to 9 4 % ,

= 0.68 to 0.88) a n d s e c o n d ( 8 0 % to 9 0 % , ~ = 0.65 to 0.82) order, L A D first ( 8 2 % to 9 6 % , ~ = 0 . 6 4 to 0.91) a n d s e c o n d ( 7 8 % to 9 5 % , ~: = 0.63 to 0.9) order, n o n - L A D first ( 8 8 % to 9 3 % , ~ = 0.77 to 0.87) a n d s e c o n d ( 8 0 % to

266

Golub et al Myocardial perfusion imaging with Tc-99m sestamibi

Journal of Nuclear Cardiology May/June 1999

Table 7. I n t e r o b s e r v e r a g r e e m e n t for n o n - L A D first- a n d s e c o n d - o r d e r i n t e r p r e t a t i o n of e x e r d s e a n d rest Tc9 9 m s e s t a m i b i SPECT m y o c a r d i a l perfusion i m a g e s

EX 1

EX 2

EX 3

LEX 4

LEX 5

LEX 6

% (K)

% (~:)

% (~)

% (~)

% (~)

% (~)

76 (0.533) 76 (0.517) 82 (0.632)

.

81 (0.618) 82 (0.646) ~ . . .

68 (0.473) 70 (0.503) 72 (0.545)

.

76 (0.577) 77 (0.599) ~ . . .

Non-LAD--first order EX 1 -EX 2 m EX 3 -LEX4 . LEX 5 . LEX 6 . Non-LAD--2nd order EX 1 -EX 2 --EX 3 LEX 4 . LEX 5 . LEX 6 .

86 (0.710) --.

. . .

.

77 (0.594) --.

. . .

.

85 88 83 76

(0.695) (0.753) (0.661) (0.533)

84 84 87 76 86

(0.672) (0.687) (0.742) (0.531) (0.715)

79 81 78 68

(0.622) (0.663) (0.601) (0.477)

79 79 84 68 81

(0.623) (0.631) (0.720) (0.471) (0.648)

. .

.

. .

.

First order, Normal versus abnormal image; second order, no defect versus fixed defect versus reversible defect; EX, experienced reader; LEX, less-experienced reader; %, percent agreement; ~c, kappa statistic.

Table 8. l n t e r o b s e r v e r a g r e e m e n t for defect e x t e n t in the i n t e r p r e t a t i o n of exercise a n d rest T c - 9 9 m s e s t a m i b i SPECT m y o c a r d i a l perfusion i m a g e s : n o r m a l , s i n g l e - v e s s e l CAD, m u l t i v e s s e l CAD

EX 1 EX 2 EX 3 LEX 4 LEX 5 LEX 6

EX i

EX 2

EX 3

LEX 4

LEX 5

LEX 6

% (~:)

% (~)

% (~)

% (K)

% (K)

% (~)

75 (0.596) --.

69 (0.539) 72 (0.579) w . . .

58 (0.373) 59 (0.391) 68 (0.522)

--w . . .

. .

. .

.

76 82 72 59

. .

(0.628) (0.710) (0.586) (0.394)

72 74 78 61 77

(0.569) (0.604) (0.666) (0.427) (0.646)

.

EX, Experienced reader; LEX, less-experienced reader; %, percent agreement; ~c, kappa statistic.

90%, lc = 0.68 to 0.82) order, and defect extent (75% to 90%, ~c = 0.62 to 0.83) interpretations (Tables 1 to 4).

ond (68% to 84%, ~c = 0.47 to 0.72) order, and defect extent (58% to 82%, ~: = 0.37 to 0.71) interpretations (Tables 5 to 8).

lnterobserver Agreement As expected, interobserver was lower than intraobserver agreement. Moderate to good interobserver agreement was found among all readers for global first (73% to 89%, ~: = 0.45 to 0.78) and second (64% to 85%, • = 0.41 to 0.74) order, LAD first (73% to 93%, ~c = 0.45 to 0.84) and second (69% to 91%, ~: = 0.43 to 0.79) order, non-LAD first (76% to 88%, ~: = 0.52 to 0.75) and sec-

Experienced Versus Less-experienced Combined intraobserver agreement for both global and n o n - L A D first- and second-order, LAD first order, as well as defect extent interpretations were not significantly different between experienced and less-experienced readers (Tables 1, 3, and 4). Combined agreement for L A D - s e c o n d order interpretation was 93% (• = 0.85)

Journal of Nuclear Cardiology Volume 6, Number 3;257-69

for experienced compared with 88% (~: -- 0.75) for lessexperienced readers (P -- .015) (Table 2). Interobserver agreement for global first- and secondorder interpretations ranged from 85% to 87% (~: = 0.68 to 0.72) and 78% to 82% (n = 0.62 to 0.68) with experienced compared with 73% to 84% (n = 0.45 to 0.69) and 64% to 79% (n = 0.41 to 0.63) with less-experienced readers, respectively. Interobserver agreement for LAD first- and second-order interpretations ranged from 89% to 91% (n = 0.75 to 0.78) and 88% to 89% (n = 0.75 to 0.76) with experienced compared with 73% to 91% (~c = 0.45 to 0.78) and 69% to 88% (~: = 0.43 to 0.72) with less-experienced readers, respectively. Interobserver agreement for non-LAD first- and second-order interpretations ranged from 81% to 86% (~: = 0.62 to 0.71) and 76% to 77% (~c = 0.58 to 0.60) with experienced compared with 76% to 86% (~; = 0.53 to 0.72) and 68% to 84% (~: = 0.47 to 0.72) with less-experienced readers, respectively. Interobserver agreement for defect extent interpretations ranged from 69% to 75% (~ = 0.54 to 0.60) with experienced compared with 59% to 77% (~: = 0.39 to 0.65) with less-experienced readers, respectively.

DISCUSSION Results from this study suggest that nuclear cardiologists with a wide range of training and experience have moderate to excellent interpretive reproducibility with interpretable exercise and rest Tc-99m sestamibi SPECT images in patients with either known or <5% likelihood of CAD. It was also observed that less-experienced readers have similar reproducibility for global, non-LAD, and defect extent interpretations but greater variability compared with experienced readers.

Comparison with Previous Studies Our results suggest that the interpretive reproducibility of exercise Tc-99m sestamibi SPECT imaging is comparable and possibly better than that previously reported with planar thallium-201 imaging. 6-9 For example, Trobaugh et al 8 examined agreement among 4 expert readers at 2 institutions for interpretation of unprocessed Polaroid scintiphotos as globally normal, borderline, or abnormal. There was complete agreement of 67%, lower than this study. Okada et al9 observed considerable segmental interobserver variability with experienced readers interpreting 50 exercise and delayed images. Although complete observer agreement was observed for overall diagnosis in 58% of images, it was determined that agreement and accuracy improved with a consensus of multiple readers. Atwood et a115 assessed both intraobserver and interobserver agreement among 4 experienced readers.

Golub et al Myocardial perfusion imaging with Tc-99m sestamibi

267

Intraobserver agreement ranged from 89% to 93% (~: 0.73 to 0.86) when images were interpreted as normal or abnormal. However, there was significantly less interobserver and vascular territory agreement. Wackers et al7 assessed factors affecting interobserver agreement in a large multicenter trial and concluded that standardization of image display and computerized quantification resulted in better overall agreement between centers. In our study, image interpretation was also standardized by use of computerized displays, tomographic acquisition, and a technetium-labeled radiopharmaceutical resulting in interobserver agreement that was good among experienced and moderate among less-experienced readers. In support of improved interpretive reproducibility of technetium-based radiopharmaceuticals are data from Hendel et a116 in which 4 experienced readers interpreted 216 exercise planar T1-201 and Tc-99m tetrofosmin images. Better agreement and image quality was observed with interpretation of Tc-99m tetrofosmin images. Other studies have observed better image quality with Tc-99m sestamibi and Tc-99m furifosmin compared with T1-201.3,17

Impact of Experience with Interpretive Reproducibility Previous studies have examined the interpretive reproducibility of experts with myocardial perfusion imaging.6-9A 5,16 Other specialties in clinical medicine have also used experts to validate procedures including coronary angiography, mammography, and stress echocardiography. 18-22Although useful in determining accuracy of a procedure, "experts" may not reflect the skill levels of many physicians interpreting images in clinical laboratories. Our study addressed this issue by evaluating the interpretive reproducibility of less-experienced and experienced readers and found good to excellent intraobserver agreement among both groups. This suggests that physicians interpreting Tc-99m SPECT images who meet ACC/ASNC training guidelines can be expected to have good interpretive reproducibility. Such physicians should also have high diagnostic accuracy for CAD with Tc-99m SPECT imaging as demonstrated by a part of this study reported previously.23 It was also observed that experienced readers had better reproducibility for LAD territory second-order interpretation compared with lessexperienced readers, suggesting that clinical experience with image interpretation may lead to improvement in both diagnostic accuracy and intraobserver reproducibility. In support of this is a recent study that demonstrated that echocardiographers trained in standard echocardiography but inexperienced in stress echocardiography required approximately 100 closely supervised studies to improve diagnostic accuracy. 19

268

Golub et al Myocardial perfusion imaging wit h Tc-99m sestamibi

Limitations

Patients with abnormal images and angiographic evidence of CAD, as well as those with normal images and a <5% likelihood of CAD, were identified for this study. Furthermore, only good to excellent quality images were selected from both groups of patients for image interpretation because the purpose was to evaluate skill levels with SPECT and a technetium-based radiopharmaceutical, not the ability to interpret poor-quality images. This nonconsecutive selection of optimal quality images in patients with either known or a <5% likelihood of CAD may have resulted in less interpretive variability than that observed in routine clinical practice. In daily practice, one not infrequently encounters images of lesser quality, offering a greater challenge to interpretation than encountered in our study. Inclusion of a wide range of images would have better reflected the interpretive reproducibility in routine clinical practice. Distinguishing the fight from circumflex coronary artery territory may be difficult because of the variable origin of the posterior descending artery, leading to spuriously increased observer variability. To avoid this situation, we combined the right and circumflex coronary artery territories into a "non-LAD" territory. Furthermore, the ability to distinguish LAD from nonLAD territory and to identify single from multivessel CAD are clinically important in the interpretation of myocardial perfusion images. By design, our study did not assess the impact of attenuation artifact, weight, poor image quality, gender, or ECG-gated SPECT imaging on interpretive reproducibility, although each of these conditions are important in the routine clinical interpretation of stress and rest myocardial perfusion imaging. Obese patients may have poor-quality images; particularly those undergoing same day rest and stress imaging protocols. Inclusion of such images would more appropriately reflect routine clinical image interpretation and may have increased intraobserver and interobserver variability, particularly with lessexperienced readers.S, 24 Conversely, most physicians interpret images in a clinical context and use quantification that improves accuracy.24-26 A high percentage of patients in this study were men because of the fact that fewer women were identified who met inclusion by imaging and cardiac catheterization criteria. Therefore, we cannot be certain that similar results would be obtained with women alone. All patients underwent exercise stress testing. It is not known whether pharmacologic (particularly dipyridamole or adenosine) stress testing would yield similar SPECT imaging results. The increased coronary blood flow during vasodilator stress differs from exercise and could result in differences with interpretive reproducibil-

Journal of Nuclear Cardiology May/June 1999

ity for both experienced and less-experienced readers. However, this is unlikely because of the multitude of studies demonstrating similar diagnostic accuracy for exercise and pharmacologic stress testing with myocardiai perfusion imaging. All patients underwent stress and rest imaging on separate days. Many laboratories currently perform a high percentage of same-day rest and stress imaging protocols, and the interpretive reproducibility and diagnostic accuracy of these images should be evaluated. Quantitation was not performed because the purpose of this study was to evaluate interpretive reproducibility. Conclusion

There is moderate to excellent interpretive reproducibility with stress Tc-99m sestamibi SPECT imaging among nuclear cardiologists with a wide range of training and experience. We gratefully acknowledge Debra Messinger, CNMT, and Lynn Sillaman, CNMT,for the processing of data and preparation of image displays and Elizabeth Doucette for manuscript preparation.

References 1. Fintel DJ, Links JM, Brinker JA, Frank TL, Parker M, Becker LC. Improved diagnostic performance of exercise thallium-201 single photon emission computed tomography over planar imaging in the diagnosis of coronary artery disease: a receiver operating characteristic analysis. J Am Coll Cardiol 1989;13:600-12. 2. Zaret BL, Belier GA, eds. Nuclear cardiology: state of the art and future directions. St. Louis: Mosby, 1993. 3. Kiat H, Maddahi J, Roy LT, et al. Comparison of technetium-99m methoxy isobutyl isonitrile and thallium-201 for evaluation of coronary heart disease by planar and tomographic methods. Am Heart J 1989;117:1-11. 4. Taillefer R, Lambert R, Dupras G, et al. Clinical comparison between thallium-201 and Tc-99m-methoxy-isobutyl isonitrile (hexamibi) myocardial perfusion imaging for detection of coronary artery disease. Eur J Nucl Med 1989;15:280-6. 5. Kahn JK, McGhie I, Akers MS, et al. Quantitative rotational tomography with T1-201 and Tc-99m 2-methoxy-isobutyl isonitrile: a direct comparison in normal individuals and patients with coronary artery disease. Circulation 1989;79:1282-93. 6. Sigal SL, Soufer R, Fetterman RC, Matters YA, Wackers FJ. Reproducibility of quantitative planar thallium-201 scintigraphy: Quantitative criteria for reversibility of myocardial perfusion defects. J Nucl Med 1991;32:759-65. 7. Wackers FJ, Bodenheimer M, Fleiss JL, et al. Factors affecting uniformity in interpretation of planar thallium-201 imaging in a multicenter trial. J Am Coil Cardiol 1993 ;21:1064-74. 8. Trobaugh GB, Wackers FJ, Sokole EB, DeRouen TA, Ritchie JL, Hamilton GW. Thallium-201 myocardial imaging: an interinstitutional study of observer variability. J Nucl Med 1978;19:359-63. 9. Okada RD, Boucher CA, Kirshenbaum HK, et al. Improved diagnostic accuracy of thallium-201 stress test using multiple observers and crite-

Journal of Nuclear Cardiology Volume 6, N u m b e r 3;257-69

10.

11. 12. 13. 14. 15.

16.

17.

ria derived from interobserver analysis of variance. Am J Cardiol 1980;46:619-24. Ritchie JL, Gibbons RJ, Johnson LL, et al. Guidelines for training in adult cardiovascular medicine: core cardiology training symposium (COCATS)--task force 5: training in nuclear cardiology. J Am Coll Cardiol 1995;25:19-23. Ellestad MH. Stress testing: principles and practice. 3rd ed. Philadelphia: FA Davis; 1986:116-7. Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas 1960;20:37-46. Landis JR, Koch GG. The measure of observer agreement for categorical data. Biometrics 1977;33:159-74. Fleiss JL. Statistical methods for rates and proportions. 2nd ed. New York: John Wiley and Sons; 1981:222-3. Atwood JE, Jensen D, Froelicher V, et al. Agreement in human interpretation of analog thallium myocardial perfusion images. Circulation 1981;64:601-9. Handel RC, Parker M, Wackers FJ, Rigo P, Lahiri A, Zaret BL. Reduced variability of interpretation and improved image quality with a technetium-99m myocardial perfusion agent: comparison of thallium-201 and technetium-99m-labeledtetrofosmin. J Nucl Cardio11994;1:509-14. Hendel RC, Verani MS, Miller DD, et al. Diagnostic utility oftomographic myocardial perfusion imaging with technetium-99m furifosmin (Q 12) compared with thallium-20h result of a phase III multicenter trial. J Nucl Cardiol 1996;3:291-300.

Golub et al Myocardial perfusion imaging with Tc-99m sestamibi

269

18. Elmore JG, Wells CK, Lee CH, Howard DH, Feinstein AR. Variability in radiologists' interpretations of mammograms. N Engl J Med 1994;33 h 1493-9. 19. Picano E, Lattanzi F, Orlandini A, Marini C, L'abbate A. Stress echocardiography and the human factor: the importance of being expert. J Am Coil Cardiol 1991;17:666-9. 20. Hoffman R, Lethen H, Marwick T, et al. Analysis of interinstitutional observer agreement in interpretation of dobutamine stress echocardiograms. J Am Coil Cardiol 1996;27:330-6. 21. Marcus ML, Skorton DJ, Johnson MR, Collins SM, Harrison DG, Kerber RE. Visual estimates of percent diameter coronary stenosis: "a battered gold standard?' J Am Coll Cardiol 1988;11:882-5. 22. Detre KM, Wright E, Murphy ML, Takaro T. Observer agreement in evaluating coronary angiograms. Circulation 1975;52:979-86. 23. Golub RJ, McClellan JR, Herman SD, et al. Effectiveness of nuclear cardiology training guidelines: a comparison of trainees with experienced readers. J Nucl Cardiol 1996;3:114-8. 24. Wackers FJ. Science, art, and artifacts: how important is quantification for the practicing physician interpreting myocardial perfusion studies. J Nucl Cardiol 1994;S109-17. 25. Simons M, Parker JA, Udelson JE, Gervino EV. The role of clinical data in interpretation of perfusion images. J Nucl Med 1994;35:740-1. 26. Simons M, Parker JA, Donohoe KJ, Udelson JE, Gervino EV. The impact of clinical data on interpretation of thallium scintigrams. J Nucl Cardiol 1994;1:365-71.