Can computerization of the exercise test replace the cardiologist?

Can computerization of the exercise test replace the cardiologist?

Can computerization of the exercise test replace the cardiologist? J. Edwin Atwood, MD, Dat Do, BA, Victor Froelicher, MD, Robert Chilton, MD, Charles...

199KB Sizes 1 Downloads 29 Views

Can computerization of the exercise test replace the cardiologist? J. Edwin Atwood, MD, Dat Do, BA, Victor Froelicher, MD, Robert Chilton, MD, Charles Dennis, MD, Jeff Froning, MA, Andras Janosi, MD, David Mortara, PhD, and Jonathan Myers, PhD, Palo Alto and Vista, Calif.; San Antonio, Texas; Milwaukee, Wis.; Brown Mills, N.J.; and Budapest, Hungary

Background The type of practitioners who use the standard exercise test is changing. Once a tool of the cardiologist, the standard exercise test is now being performed by internists and other noncardiologists. Because this change could be facilitated by computerization similar to the computerized interpretation programs available for the resting electrocardiograph (ECG), we performed this analysis. A secondary aim was to demonstrate the effects of medication status and resting ECG abnormalities on test diagnostic characteristics because these factors affect utility of the exercise test by the generalist.

Methods and Results A retrospective analysis was performed of consecutive patients referred at 2 university-affiliated Veteran’s Affairs Medical Centers and a Hungarian Hospital for evaluation of chest pain and possible ischemic heart disease. There were 1384 consecutive male patients without a prior myocardial infarction with complete data who had exercise tests and coronary angiography between 1987 and 1997. Measurements included clinical, exercise test data, and visual interpretation of the ECG recordings as well as more than 100 computed measurements from the digitized ECG recordings and compilation of angiographic data from clinical reports. The computer measurements had similar diagnostic power compared with visual interpretation. Computerized measurements from maximal exercise or recovery were equivalent or superior to all other measurements. Prediction equations applied by computer were superior to single ECG measurements. β-Blockers had no effect on test characteristics, whereas resting ST depression was associated with decreased specificity and increased sensitivity. Conclusions Computerized exercise ST measurements are comparable to visual ST measurements by a cardiologist; computerized scores that included clinical and exercise test results exhibited the greatest diagnostic power. Applying scores with a computer allows the practicing physician to improve the diagnostic characteristics of the standard exercise test. This approach is successful even when there is resting ST depression, thus lessening the need for more expensive nuclear or imaging studies. (Am Heart J 1998;136:543-52.)

One of the reasons the standard exercise test has been a favorite tool of physicians is its relative ease of interpretation. Although interpretation has been the realm of the experienced electrocardiographer, most internists and other noncardiologists are now performing the exercise test.1,2 Studies have demonstrated the superior diagnostic ability of modifications to the interpretation of the standard exercise test relying on computerization.3-5 However, a failure to consistently demonstrate the superiority of computerization has impeded its acceptance. This is partially explained by the fact that there has never been a From the Cardiology Divisions at the Veterans Affairs Palo Alto Health Care System, Stanford University; University of Texas at San Antonio; Deborah Heart Institute; Sunnyside Biomedical; and St. Janos’s Municipal Hospital. Submitted Dec. 16, 1997; accepted Feb. 20, 1998. Reprint requests: Victor Froelicher, MD, Cardiology Division (111 C), VA Palo Alto Health Care System, 3801 Miranda Ave., Palo Alto, CA 94304. Copyright © 1998 by Mosby, Inc. 0002-8703/98/$5.00 + 0 4/1/90418

study that compared computerization with standard interpretation in a population of consecutive patients with chest pain.6 Before applying this potential advance to clinical practice, secondary issues requiring resolution include demonstration of whether computerization is superior to visual analysis and if medications or resting electrocardiograph (ECG) abnormalities alter the diagnostic characteristics of the exercise ECG. If computer analysis is equivalent to visual ST interpretation by the cardiologist, it could supplement and facilitate exercise ECG interpretation similar to the widely used computer programs for the resting ECG.7 If resting ST depression does not affect test characteristics, then such patients could be tested with the standard exercise test rather than being referred for nuclear or imaging studies.8 This study was performed to address these issues and possibly empower the practitioner who would like to use this important diagnostic tool.

American Heart Journal September 1998

544 Atwood et al

Methods Patients The population included 2385 consecutive male patients with complete data who had treadmill tests at 2 VA medical centers and bicycle tests at the Hungarian site between 1987 and 1997 to evaluate chest pain or other findings thought to be caused by coronary disease. All patients had coronary angiography within 4 months of the exercise test. As is the case for clinical observational studies like this, there was no attempt to remove workup bias. Patients with previous cardiac surgery, valvular heart disease, left bundle branch block, or Wolfe-Parkinson-White syndrome on their resting ECG were excluded from the study. Patients with a previous myocardial infarction by history or by diagnostic Q wave were excluded from the diagnostic subgroup, leaving a target population of 1384 patients. Prior cardiac surgery was the predominant reason for exclusion of patients who underwent exercise testing during this time period. The clinical variables considered were obtained from the initial history. These included age, chest pain symptoms, body mass index (BMI), obesity, history of congestive heart failure, hypertension, diabetes, stroke, peripheral vascular disease, hypercholesterolemia by current values or by history, and chronic obstructive pulmonary disease (COPD) as well as family history and current and past cigarette smoking status. Chest pain symptoms were coded as 1 for typical, 2 for atypical, 3 for nonanginal pain, and 4 for no chest pain. Resting ECGs were coded abnormal if they exhibited 1 or more of the following: ST depression >0.1 mm, left ventricular hypertrophy (LVH), or T-wave inversion. All clinical variables except for age, chest pain, and BMI were coded 0 (absent) or 1 (present). Although much of these data were gathered prospectively by use of computerized forms,9,10 some of the patients initially studied had incomplete data requiring retrospective chart review.

Exercise testing At the VA medical centers, patients underwent treadmill testing with the USAFSAM11 or an individualized ramp treadmill protocol.12 Before ramp testing, the patients were given a questionnaire consisting of a list of activities presented in an increasing order according to metabolic equivalents (METs). This questionnaire estimated the patient’s exercise capacity before the test and thus allowed most patients to reach maximal exercise at approximately 10 minutes.13 At the Hungarian site, a progressive bicycle protocol was used and the highest Watts performed was converted to METs by using peak work rate and accounting for body weight. Visual ST-segment deviation was measured at the J junction and corrected for preexercise ST depression while the patient was standing; ST slope was measured over the following 60 ms and classified as upsloping, horizontal, or downsloping. Slope was coded as 1 for horizontal, 2 for downsloping, and 0 for normal slope

(upsloping or ST depression <0.5 mm). The ST response considered was the greatest amount of horizontal or downsloping ST depression in any lead except aVR during exercise or recovery. An abnormal response was defined as ≥1 mm of horizontal or downsloping ST depression. In addition, all of the following hemodynamic measurements were recorded: resting and maximal heart rate, change in heart rate, resting and maximal systolic blood pressure, change in systolic blood pressure, maximal double product, change in double product, exercise-induced hypotension (a drop in exercise systolic blood pressure below standing or a drop in systolic blood pressure of 20 mm Hg after a rise), and exercise capacity estimated in METs. Angina during testing was classified according to the Duke Exercise Angina Index (DAP = 2 if angina required stopping the test, 1 if angina occurred during or after exercise testing, and 0 for no angina).14 No test was classified as indeterminate,15 medications were not withheld, and no maximal heart rate targets were applied.16 All the exercise tests were performed, analyzed, and reported per standard protocol and with a computerized data base (EXTRA, Mosby Publishers, Chicago); the cardiac catheterization was consistent with clinical practice at each institution, and results were abstracted from clinical reports. All exercise ECG analysis and comparisons were performed blinded from clinical and angiographic results.

Computer analysis Microprocessor-based exercise ECG devices were used at the 3 sites to simultaneously record all 12 ECG leads through exercise and recovery at 500 samples/sec (Mortara Electronics, Milwaukee, Wis.) on optical disks. Optical disk recordings were processed off-line with standard personal computers. Averaging of the raw data from 3 leads (II, V2, and V5) and determination of QRS onset and offset points was performed by using software developed by Sunnyside Biomedical (Vista, Calif.). The computer-chosen isoelectric line and QRS onset and offset points were confirmed visually for their accuracy. During the last year of data collection at PAVAHCS, a QUEST treadmill system (Burdick, Milton, Wis.) was used. This system collected data on PCMCIA cards and used a 12lead on-line version of the software. The following measurements and calculations were evaluated: (1) ST0 (J-junction) and ST60 (60 ms after the J-junction) 2 minutes before maximal exercise, at maximal exercise, and at 1, 3.5, and 5 minutes recovery, (2) ST slope, which was based on a least-squares fit between ST0 and ST60, at the same times as the amplitude measurements, (3) ST integral,17 (4) ST index, (5) the sum of and the most ST depression in II, V2, and V5 at maximal exercise and 3.5 minute recovery, (6) ST0 and ST60/heart rate (HR) index and slope,18-20 (7) Hollenberg’s treadmill exercise score (which includes time-amplitude plots for the 3 leads in

American Heart Journal Volume 136, Number 3

Atwood et al 545

Table I. Clinical characteristics Variables Age Symptom status Typical angina Atypical angina Nonanginal chest pain No chest pain Chest pain score (1-4 [none]) Diabetes Abnormal resting ECG Resting ST depression (ST0) Hypercholesterolemia Currently or ever smoked BMI (kg/m2) Peripheral vascular disease Congestive heart failure COPD Family history of CAD Hypertension Stroke Digoxin β-Blocker

No CAD n = 559 55 ± 11 115 (21%) 351 (63%) 48 (9%) 45 (8%) 2.0 ± .8 61 (11%) 122 (22%) 71 (13%) 162 (29%) 374 (67%) 28 ± 5 47 (8%) 25 (5%) 34 (6%) 246 (44%) 271 (49%) 12 (2%) 21 (4%) 132 (24%)

Any CAD n = 825 (60%)

P value

62 ± 9

<.0001

357 (43%) 360 (44%) 62 (8%) 46 (6%) 1.8 ± .8 142 (17%) 244 (30%) 157 (19%) 343 (42%) 543 (66%) 28 ± 5 73 (9%) 23 (3%) 55 (7%) 349 (42%) 454 (55%) 29 (3.5%) 24 (3%) 246 (30%)

<.0001 <.0001 NS NS <.0001 .001 .001 .002 <.0001 NS NS NS NS NS NS .02 NS NS .01

NS, Nonsignificant. Data are presented as mean ± SD or number (percent) of subjects.

exercise and recovery [6 separate areas]), and (8) ST60 in V5 during exercise at HRs of 100 and 110 beats/min. Several empirical composite adjustments were made in an attempt to simulate visual analysis by adjusting for baseline depression and using slope criteria changing with HR. R-wave amplitude was available at all of the time periods, and results obtained adjusting the ST measurements by this amplitude are reported.

Coronary angiography Coronary artery narrowing was visually estimated and expressed as percent luminal diameter stenosis at each site, blinded to the patient’s history and exercise test results. Patients with a 50% narrowing in 1 or more of the following were considered to have significant angiographic coronary artery disease (CAD): the left anterior descending, left circumflex, or right coronary artery or their major branches or a 50% narrowing in the left main coronary artery. The 50% criterion was chosen to be consistent with the cooperative trialists’ choice.21

Statistical methods With the use of equations formulated in a spreadsheet (Excel, Microsoft Corp., Redmond, Wash.), >100 computer variables were evaluated by range of characteristics range of characteristics analysis in the total sample. In addition, sensitivity for specificity matching that of visual analysis criterion was also calculated (True Epistat, Richardson, Tex.). This was

done because it is at the area of the ROC curve that the clinician usually applies the exercise test, and any proposed improvement must be compared with its current performance. With the sample size, the 95% confidence interval for the ROC curves was ±0.02 and so only measurements within the 95% confidence interval of visual analysis were chosen for presentation. A logistic regression model was developed (True Epistat, Richardson, Tex.) with clinical, hemodynamic, and non-ECG variables in stepwise fashion. To this model the visual ST measurement was added to form a final model. Two other final models were formed by adding the best ST measurement in recovery and by adding the best computerized ST measurement at maximal exercise to the original non-ECG model. The measurements and the models were then tested in the 3 populations from the different medical centers. Even though they were subsets of the total sample from which the scores were derived, each had a different prevalence of coronary disease and a different rate of abnormal exercise tests. The appropriate values are inserted into the following logistic regression formula to calculate an estimate of the probability for angiographic coronary disease: Probability (0 to 1) = 1/(1 + e – (a + bx + cy...)) where a is the intercept, b and c are coefficients, and x and y are variable values. The performance of visual and computerized exercise ECG measurements and the models were also assessed considering medication status and the resting ECG. The resting ECG

American Heart Journal September 1998

546 Atwood et al

Table II. Exercise test results Variables

No CAD (n = 559)

Maximal HR (beats/min) Change in HR (beats/min) Maximal SBP (mm Hg) Change in SBP (mm Hg) Maximal double product (×1000) Change in double product (×1000) METs Exercise angina score (0-2) Abnormal ST depression

P value

Any CAD (n = 825) (60%)

137 ± 24 56 ± 24 170 ± 27 46 ± 26 23.5 ± 6.5 13.8 ± 6.2 8.5 ± 3.4 0.37 ± 0.60 115 (21%)

125 ± 22 48 ± 20 168 ± 30 38 ± 31 21.3 ± 6.1 11.4 ± 5.5 6.9 ± 3.8 0.69 ± 0.77 426 (52%)

<.0001 <.0001 NS <.0001 <.0001 <.0001 <.0001 <.0001 <.0001

Data are presented as mean ± SD or number (percent) of subjects.

Table III. Diagnostic characteristics of computerized ST measurements with results comparable to visual analysis with sensitivity at cut point associated with specificity matching 1 mm visual analysis (80%) ST measurement V5 slope 3.5 minute recovery (mV/ms) V5 ST60 3.5 minute recovery Sum ST60 3.5 minute recovery Most ST60 3.5 minute recovery V5 ST60 5-minute recovery V5 slope 5-minute recovery (mV/ms) ST/HR index Visual ST analysis

ROC (± 1 SE)

Sensitivity (± 1 SE)

Average ROC (± 1 SD)

Average sensitivity (± 1 SD)

0.68 ± 0.02 0.68 ± 0.01 0.68 ± 0.02 0.67 ± 0.02 0.67 ± 0.02 0.67 ± 0.02 0.69 ± 0.01 0.67 ± 0.01

45 ± 2 49 ± 2 48 ± 2 49 ± 2 43 ± 2 42 ± 2 51 ± 2 52 ± 2

0.68 ± 0.02 0.67 ± 0.02 0.67 ± 0.01 0.67 ± 0.01 0.67 ± 0.02 0.67 ± 0.02 0.66 ± 0.02 0.67 ± 0.03

44 ± 2 49 ± 4 47 ± 4 48 ± 2 44 ± 4 41 ± 2 47 ± 4 51 ± 3

was classified by visual criteria and also by the computer ST measurements made at rest.

Results Population characteristics The mean age of this male population was 59 ± 10 years. Age, chest pain, hypercholesterolemia, diabetes, and abnormal resting ECG were significantly different between those with and those without coronary disease. Table I lists all of the important clinical variables.

Post–exercise test hemodynamic, non-ECG, and visual ECG results Table II compares the exercise test data between those with and those without any obstructive angiographic coronary disease. The Duke treadmill angina score and all of the hemodynamic measurements were significantly different except for maximal systolic blood pressure (SBP).

ST criteria performance and validation The diagnostic performance of the ST variables that exhibited an average ROC area within the 95% confi-

Cut point

0.064 –0.055 mV –0.084 mV –0.053 mV –0.054 mV –0.016 –0.0022 mV˙beats/min 1 mm

dence intervals associated with visual analysis when tested within 5 randomly selected one half population samples are tabulated in Table III. The computerized ST measurements with the highest discriminating power are listed in Table III. They included visual ST analysis, the sum of the depression at ST60 in II, V5, and V2, the most ST60 depression in these 3 leads, the time area in recovery of the slope and ST60 for V5 (part of the Hollenberg score), HR index (ST60 or ST0 V5), and ST60 in V5 at 3.5 minutes of recovery. Measurements made at 3.5 minutes of recovery and by using V5 predominated compared with other leads or at other time points. Thus only these 7 measurements out of the 100 that were calculated by the exercise ECG analysis program had ROC curve areas >0.65. Although several of the ST time areas that are part of the Hollenberg score had ROC curve areas comparable to visual analysis, the score itself had an ROC area of 0.65 (sensitivity of 42% at a specificity of 80%). The independent areas are not listed because their complexity exceeds that of the other measurements. In addition, the sensitivity of the measurements at a specificity of 80%, matching visual analysis, is also listed.

American Heart Journal Volume 136, Number 3

Atwood et al 547

Figure 1

ROC curves of 3 prediction equations, visual analysis, and best computerized measurements from exercise and recovery. For reference, straight line is drawn representing no discrimination, and ROC curve for maximal heart rate (area = 0.63) is plotted to demonstrate its relative symmetry compared with ROC curves on basis of ECG variables. Vertical line is drawn through ROC curves representing point where specificity is 80%, which matches visual analysis. Curves are asymmetrical at end where specificity is high, demonstrating that sensitivities can differ around region where exercise test normally functions even when there are small or no differences between ROC curve areas. Also, because of fewer ST points measured by physicians (rounding off to full millimeters) compared with computer measurements, area formed by visual analysis is always less than computer measurements, putting visual analysis at a disadvantage. ROC, Range of characteristics.

Prediction equation development The following 3 sets of intercepts, variables, and their coefficients were developed by using stepwise logistic regression: (1) Prediction model equation considering visually measured ST depression: 0.35 + 0.05 · Age – 0.3 · Chest pain symptom + 0.6 · Elevated cholesterol + 0.4 · Diabetes – 0.02 · Maximal HR + 0.3 · DAP + 0.7 · visual ST depression; (2) Prediction model equation using the best computer

measurement during recovery: – 1.34 + 0.05 · Age – 0.3 · Chest pain symptom + 0.6 · Elevated cholesterol + 0.4 · Diabetes – 0.012 · Maximal HR + 0.5 · DAP – 5.7 * ST60 V5 3.5 minute recovery; (3) Prediction model equation using the best computer measurement during exercise: – 3.42 + 0.06 · Age – 0.3 · Chest pain symptom + 0.6 · Elevated cholesterol + 0.4 · Diabetes + 0.45 · DAP – 0.50 · (ST60/HR index · 1000).

American Heart Journal September 1998

548 Atwood et al

Table IV. Comparison of the 3 predictive equations or scores with reference to visual analysis and the single best computer measurement (ST60 V5 recovery)

Visual ST V5 ST60 3.5-minute recovery PE with visual ST PE with recovery V5 ST60 (computer) PE with exercise ST/HR index (computer)

Cut point

Sensitivity

Specificity

Predictive accuracy

1 mm –0.054 mV 0.67 0.65 0.64

52% 49% 61% 59% 59%

79% 80% 80% 80% 80%

63% 61% 69% 68% 68%

ROC area 0.67 0.68 0.79 0.77 0.77

PE, Predictive equations. Note that the cut point for calculated probability of coronary disease average out to be 0.65 to match the specificity obtained with simple visual analysis. The difference in predictive accuracy means that 5 or 6 more patients per 100 tested are correctly classified using the predictive equations.

Variable definitions for calculations are: Chest pain symptoms from 1 [typical] to 4 [none], DAP: 2 = angina major reason for stopping, 1 = exercise-induced angina, 0 = no angina; Visual ST: Maximal visual ST depression in exercise or during recovery. ST was recorded in millimeters if ST depression was ≥0.5 mm horizontal or downsloping or ≥2 mm upsloping; ST60 amplitude in negative millivolts; and ST60 amplitude in V5 at 3 minutes in recovery in negative millivolts.

Prediction equation performance and validation The models were developed considering the fact that some clinicians prefer to use a maximal exercise ST measurement rather than one from recovery. For the recovery ST measurement to have the same diagnostic characteristics as it did in this study, exercise must be stopped abruptly (no cool-down walk performed) and the patient placed supine after exercise. The probabilities generated by using the models were plotted as ROC curves (Fig. 1) and the areas calculated (Table IV). There was a significant improvement in the ROC areas for each of the models compared with the visual analysis or with one of the best computer measurements (P < .0001). In addition, sensitivities for the models at a specificity comparable to visual criteria of 1 mm (80%) were obtained from the ROC curves and tabulated below. Predictive accuracy is also calculated because it represents the percentage of patients correctly classified and is a more practical measure for comparing the discriminating methods. As can be seen in Table IV, all 3 models provided similar discriminating capability and were superior to solitary ST measurements, either visual or by computer. Also, the cut points of the predicted probabilities (0 to 1) to match the specificity of visual analysis were 0.67, 0.65, and 0.64 for the 3 equations. Thus for comparison purposes, a predicted

probability for coronary disease of 0.65 (or 65%) is a cut point associated with a specificity of 80% comparable to visual analysis.

Effect of medications and resting ECG abnormalities As tabulated in Table V, β-blocker administration did not affect the diagnostic characteristics of the standard visual criteria. While digoxin lowered the specificity of the test, it was only administered to a small number of patients. The clinical indication for digoxin was not known and the condition for which it was prescribed could affect the ST response. LVH and visually classified resting ST depression had a similar association with a lowered specificity. T-wave inversion had a trend toward similar changes, but did not affect test characteristics as much. The exclusion of all patients with resting ECG abnormalities as well as digoxin use significantly lowered sensitivity and raised specificity (P < .001). The computer classification of resting ST depression confirmed the visual classification results by obtaining nearly the same sensitivity and specificity.

Population and prevalence effects The percentage of patients with angiographic coronary occlusions ≥50% ranged from 35% in the Hungarians to 60% of the veterans from Palo Alto to 80% in those from Long Beach. Exercise test hemodynamic responses had no significant population differences after age adjustment. Test characteristics were relatively constant over the 3 populations. Comparison of the 3 populations permitted estimation of the effect of disease prevalence, percentage of abnormal treadmill tests, and the varying degrees of workup bias in the 3 populations on the calibration of the cut points of the probability scores from the

American Heart Journal Volume 136, Number 3

Atwood et al 549

Table V. Effect of medications and resting ECG abnormalities on diagnostic characteristics of simple ST analysis and 2 of the prediction equations

Specificity

Pred Acc

ROC vis ST

ROC of PE ROC of PE with vis w/computer exer ST rec ST

n 1384 378 45

60% 65% 53%

39% 38% 58%

52% 47% 67%

79% 78% 52%

63% 58% 60%

0.67 0.66 insufficient

0.79 0.81 insufficient

0.77 0.78 insufficient

43 193 228 366 377

60% 65% 69% 67% 67%

67% 50% 68% 59% 58%

81% 61% 76% 69% 68%

53% 72% 48% 61% 62%

70% 65% 67% 66% 66%

insufficient 0.67 0.69 0.68 0.69

insufficient 0.77 0.77 0.77 0.77

insufficient 0.77 0.74 0.76 0.76

1007

57%

32%

44%

85%

62%

0.66

0.79

0.77

579

65%

52%

61%

66%

62%

0.66

0.75

0.75

805

55%

30%

41%

88%

58%

0.66

0.79

0.78

Group Total population Receiving β-blockers Receiving digoxin Visual ECG LVH T-wave inversion Visual ST depression Abnormal resting ECG Plus digoxin No digoxin or abnormal rest ECG Computer ECG Rest ST depression by computer No rest ST depression by computer

% Abnormal visual ST ET Sensitivity

Prev CAD

PE, Predictive equations with either a visual or a computerized exercise ST measurement; % Abnl vis ST ET, % of abnormal exercise tests by visual ST criteria; Prev CAD, prevalence of abnormal angiograms; insufficient, insufficient number to make a reliable calculation.Total population results are provided for comparison; Visual ECG results are for standard interpretation of the resting ECG; abnormal resting ECG includes LVH, T-wave inversion, and resting ST depression; Computer ECG separates the patients according to computer measurements of the ST segment at rest.

models. The Hungarian subpopulation most simulates patients seen in a typical clinic because the prevalence of coronary disease and the rate of abnormal exercise tests is relatively low. If the clinician uses the computed probability of coronary disease of ≥65% as a cut point, this is associated with odds of disease of 3 times the odds for patients with a calculated probability of <65%.

Miscellaneous issues Other leads. Review of the 12-lead visual ECG interpretations confirmed that changes isolated to the inferior leads and anterior leads as well as changes isolated to V4 or V6 were rare, and there were no significant ST changes that were not reflected in V5. Considering the sum of ST depression or the most depression in the 3 leads representing the 3 main areas of the myocardium failed to improve the diagnostic accuracy of the test compared with visual analysis or computer analysis of a single lead. Recovery measurements. Although the visual analysis considered abnormal ST depression in exercise and/or recovery (sensitivity 52%, specificity 79%, ROC .67), a separate analysis of the data set revealed that 110 of the 541 abnormal ST responders achieved the 1 mm ST criteria only in exercise and 60 were abnormal in recovery only. If the ST response was

considered abnormal if the criteria were achieved in exercise, regardless of the status in recovery, the sensitivity was 46% and the specificity was 81% (ROC 0.65). If the ST response was considered abnormal if the 1 mm criteria were achieved in recovery, regardless of the result during exercise, the sensitivity was 43% and the specificity was 87% (ROC 0.67). Also, the ROC values for measurements in recovery were greater than comparable measurements during maximal exercise (see Table III). R-wave adjustment. Dividing the computer measurements by the computer-measured R-wave amplitude at the time of the measurement failed to significantly improve the ROC areas (the highest [0.68] was obtained by adjusting ST60 in lead V5 at 3.5 minute recovery).

Discussion Comparison with meta-analysis From a meta-analysis of 147 consecutively published reports, involving patients who underwent both coronary angiography and exercise testing, only the results in the 41 studies (9123 patients), which appropriately excluded patients with a prior myocardial infarction,22 accurately portray the performance of the standard exercise test for comparison with this study. These studies, with a 56% average prevalence of angiographic

American Heart Journal September 1998

550 Atwood et al

disease and 50% rate of abnormal exercise tests, demonstrated a mean sensitivity of 68% and a mean specificity of 74%. The patients in our study had a 60% prevalence of angiographic disease, 39% rate of abnormal exercise tests, and the visual interpretation of the exercise test had a 52% sensitivity and a 80% specificity, all within the range found in the meta-analysis.

Comparison with other studies of computer analysis An extensive library search and literature review were conducted to find all exercise ECG research reports that compared multiple computerized criteria for diagnosing the presence of angiographic coronary disease. The search resulted in 6 studies: Ascoop et al.,23 Simoons et al.,24,25 Detry et al.,26 Deckers et al.,27 Detrano et al.,28,29 and Pruvost et al.30 Two of the 6 studies found computerized measurements to be superior to visual and one of them found them to be comparable. One of the 6 found multivariable techniques to be superior to visual, but the computerized ST measurements were not necessary in the prediction equation. The other 2 did not consider visual measurements and found a multivariable model to be superior to any single ST measurement. Our study, while similar, is more thorough than these 6 studies because we compared all of the major computerized and visual ST measurements as well as applied multivariable discriminate techniques. Hollenberg et al.31 proposed a treadmill exercise score (TES) that grades the ST segment response by considering all computerized ST depressions and slope measurements made during exercise and throughout recovery and combines heart rate and METs. They reported that TES discriminated diseased from nondiseased patients with 85% sensitivity at a 95% specificity and was superior to visual methods. Detrano et al., Deckers et al., and this analysis did not validate the findings of Hollenberg et al. Several of the ST-segment measurements plotted over exercise and recovery time that are part of the score matched our results with visual analysis and in fact had superior diagnostic characteristics to TES itself.

Other leads As in a prior study based on visual analysis,32 ST changes isolated to leads other than V5 were rare and did not improve the diagnostic ability of the exercise ECG. This was confirmed by the computerized measurements because the sum of ST depression or the most depression in the 3 leads (II, V2, and V5) failed to improve the diagnostic accuracy of the test.

Recovery measurements The importance of recovery measurements was consistent with previous experience from visual33 and computer analyses.34 Also, the ROC values for other ST measurements in recovery tended to be greater than comparable measurements during maximal exercise. The recovery time is probably so important because the conflicting impact of increasing HR during exercise “pulling” up the ST segment (resulting in a trend toward a positive slope) is no longer present. It is important to have the patient lie down immediately after exercise and not perform a cool-down walk for this measurement to function as it did in this study. Because it is a simple measurement, less contaminated by noise, ST60 in V5 at 3.5 minutes of recovery is an important measurement.

R-wave adjustment Prior studies have suggested that adjusting ST depression measurements by R-wave amplitudes may yield greater diagnostic results than ST depression measurements alone.35 The reason is that patients with small R-wave amplitudes do not manifest as much ST depression with exercise despite the presence of CAD, whereas patients with large R-wave amplitudes would have exaggerated ST changes.36 We did not observe any differences in the ROC areas with the computer measurements in V5 at maximal exercise or during recovery by dividing by R-wave amplitude.

Effect of medication status and the resting ECG β-Blocker administration did not effect the diagnostic characteristics of the standard visual criteria, in agreement with previous findings.37 Digoxin lowered the specificity, but it was only administered to a small number of patients. It was not clear why it was administered to many of the patients, and the reason or condition for which it was prescribed could affect the ST response. LVH and visually classified resting ST depression had a similar association with a lowered specificity, also in agreement with previous findings.38 T-wave inversion had a trend toward similar changes but did not affect test characteristics as much. The exclusion of all patients with resting ECG abnormalities as well as those taking digoxin significantly lowered sensitivity and raised specificity (P < .001).

Multivariable prediction of any angiographic coronary disease The application of multivariate analysis by using discriminate function and logistic regression techniques

American Heart Journal Volume 136, Number 3

to clinical and exercise test variables has been repeatedly shown to improve on the standard application of the exercise ECG test to diagnosing CAD. A recent meta-analysis of 24 studies that considered exercise test and clinical variables to predict presence of any angiographic disease found the following variables to be significant predictors in more than half of the studies: sex, chest pain symptoms, age, elevated cholesterol, ST slope and depression, and maximal HR. Exercise capacity, exercise-induced angina, double product, maximal SBP, diabetes mellitus, smoking history, abnormal resting ECG, hypertension, and family history of CAD were less often noted to be significant predictors. Our study was consistent with prior studies in that age, hypercholesterolemia, maximal HR, and exercise-induced ST depression were significant predictors of CAD. Our study differed in that diabetes and angina induced by the exercise test were selected. The choice of a probability level from the prediction equations to be applied as a cut point for abnormal/normal has always been problematic because of population differences. Analysis of our 3 populations supports the finding that a probability cut point of 65% of abnormal will function superiorly to 1 mm of ST depression alone in a population similar to that seen by a practitioner. However, the clinician may chose to use a score and consider test results with a ≥50% probability as an abnormal in order to achieve a greater sensitivity. The physicians in practice can chose values that suit the clinical situation. The scores also improved the diagnostic characteristics of the test in the patients with resting repolarization abnormalities, who are frequently referred to nuclear or imaging studies rather than a standard exercise test.

Summary The major limitations of this study are the lack of women, the retrospective design, and the failure to remove workup bias. The hypothesis must be confirmed in a prospective study designed to reduce workup bias. However, many of our findings have clinical relevance. On the one hand, we did not validate previous studies that found HR and R-wave amplitude adjustment or computerized measurements and scores to be superior to visual analysis. Instead we found that computerized measurements of V5 60 ms after QRS end at maximal exercise divided by HR or the amplitude at 3.5 minutes of recovery were able to match visual analysis performed by expert electrocardiographers. We confirmed that β-blockers do not

Atwood et al 551

affect the diagnostic characteristics of the exercise ECG but that resting ST depression raises sensitivity more than it lowers specificity. Thus the general practitioner need not stop β-blockers or exclude patients with resting ST depression. We also confirmed that prediction equations (scores) by using visual or computerized ST measurements significantly improved the diagnostic characteristics of the standard exercise test. Our analysis was unique in providing 3 equations, using visual ST analysis or computerized ST measurements from exercise or recovery that provide better diagnostic characteristics than simple visual analysis. Finally, we demonstrated that a probability cut point of 65% as abnormal with any of these equations will diagnose coronary disease better than traditional ST analysis in a typical clinical population.

References 1. Schlant RC, et al. Guidelines for exercise testing (ACC/AHA Task Force). J Am Coll Cardiol 1986;8:725-8. 2. Froelicher V, Grauer K, Hizon J. Exercise stress testing: new guidelines, current practice. Patient Care 1998;Jan 30:12:54-64. 3. Del Campo J, Do D, Urnann T, McGowan V, Froning J, Froelicher V. Comparison of computerized and standard visual criteria of exercise ECG for diagnosis of coronary artery disease. Ann Noninvas Electrocardiogr 1996;1:430-42. 4. Yamada H, Do D, Morise A, Froelicher V. Review of studies utilizing multi-variable analysis of clinical and exercise test data to predict angiographic coronary artery disease. Prog Cardiovasc Dis 1997; 39:457-81. 5. Okin PM, Kligfield P. Heart rate adjustment of ST segment depression and performance of the exercise electrocardiogram: a critical evaluation. J Am Coll Cardiol 1995;25:1726-35. 6. Philbrick JT, Horowitz RI, Feinstein AR. Methodological problems of exercise testing for coronary artery disease: groups, analysis and bias. Am J Cardiol 1989;64:1117-22. 7. Willems J, et al. The diagnostic performance of computer programs for the interpretation of ECGs. N Engl J Med 1991;325:1767-73. 8. Gibbons RJ, Balady GJ, Beasley JW, Bricker JT, Duvernoy WF, Froelicher VF, et al. ACC/AHA guidelines for exercise testing: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (Committee on Exercise Testing). J Am Coll Cardiol 1997;1:260-311. 9. Ustin J, Umann T, Froelicher V. Data management: a better approach. Physicians and Computers 1994:12:30-3. 10. Froelicher V, Shiu P. Exercise test interpretation system. Physicians and Computers 1996:14:40-4. 11. Wolthuis R, Froelicher VF, Fischer J, Longo M, Triebwasser J. New practical treadmill protocol for clinical use. Am J Cardiol 1977; 39:697-700. 12. Myers J, Buchanan N, Walsh D, Kraemer M, McAuley P, Froelicher VF. A comparison of the ramp versus standard exercise protocols. J Am Coll Cardiol 1991;17:1334-42. 13. Myers J, Do D, Herbert W, Ribisl P, Froelicher VF. A nomogram to predict exercise capacity from a specific activity questionnaire and clinical data. Am J Cardiol 1994;73:591-6.

American Heart Journal September 1998

552 Atwood et al

14. Mark D, Hlatky M, Harrell F, et al. Exercise treadmill score for predicting prognosis in coronary artery disease. Ann Intern Med 1987; 106:793-800. 15. Reid M, Lachs M, Feinstein A. Use of methodological standards in diagnostic test research. JAMA 1995;274:645-51. 16. Fletcher G, et al. AHA Medical/Scientific Statement: exercise standards. Circulation 1995;91:580-615. 17. Sheffield LT, Holt TH, Lester FM, et al. On-line analysis of the exercise ECG. Circulation 1969;40:935-44. 18. Okin PM, Kligfield P. Heart rate adjustment of ST segment depression and performance of the exercise electrocardiogram: a critical evaluation. J Am Coll Cardiol 1995;25:1726-35. 19. Okin PM, Chen J, Kligfield P. Effect of baseline ST segment elevation on test performance of standard and heart rate-adjusted ST segment depression criteria. Am Heart J 1990;119:1280-6. 20. Lachterman B, Lehmann KG, Detrano R, et al. Comparison of ST segment/heart rate index to standard ST criteria for analysis of exercise electrocardiogram. Circulation 1990;82:44-50. 21. Yusuf S, Zucker D, Peduzzi P, Fisher LD, Takaro T, Kennedy JW, et al. Effect of coronary artery bypass graft surgery on survival: overview of 10-year results from randomised trials by the Coronary Artery Bypass Graft Surgery Trialists Collaboration. Lancet 1994; 344:563-70. 22. Reid M, Lachs M, Feinstein A. Use of methodological standards in diagnostic test research. JAMA 1995;274:645-51. 23. Ascoop CA, Distelbrink CA, DeLang PA. Clinical value of quantitative analysis of ST slope during exercise. Br Heart J 1977; 39:212-7. 24. Simoons M. Optimal measurements for the detection of coronary artery disease by exercise electrocardiography. Comput Biomed Res 1977;10:483-99. 25. Simoons M, Hugenholtz PG. Estimation of the probability of exercise-induced ischemia by quantitative ECG analysis. Circulation 1977;56:552-9. 26. Detry JMR, Robert A, Luwaert RJ, et al. Diagnostic value of computerized exercise testing in men without previous myocardial infarction. Eur Heart J 1985;6:227-38. 27. Deckers JW, Rensing BJ, Tijssen JGP, et al. A comparison of meth-

28.

29.

30.

31.

32.

33.

34.

35.

36. 37.

38.

ods of analyzing exercise tests for diagnosis of coronary artery disease. Br Heart J 1989;62:438-44. Detrano R, Salcedo E, Leatherman J, Day K. Computer-assisted versus unassisted analysis of the exercise electrocardiogram in patients without myocardial infarction. J Am Coll Cardiol 1987; 10:794-9. Detrano R, Salcedo E, Passalacqua M, Friis R. Exercise electrocardiographic variables: a critical appraisal. J Am Coll Cardiol 1986; 8:836-47. Pruvost P, LaBlanche JM, Beuscart R, et al. Enhanced efficacy of computerized exercise test by multivariate analysis for the diagnosis of coronary artery disease: a study of 558 men without previous myocardial infarction. Eur Heart J 1987;8:1287-94. Hollenberg M, Budge WR, Wisneski JA, Gertz EW. Treadmill score quantifies the ECG response to exercise and improves test accuracy and reproducibility. Circulation 1980;61:276-85. Miranda CP, Liu J, Kadar A, Janosi A, Froning J, Lehmann KG, et al. Usefulness of exercise-induced ST-segment depression in the inferior leads during exercise testing as a marker for coronary artery disease. Am J Cardiol 1992;69:303-8. Lachterman B, Lehmann KG, Abrahamson D, Froelicher VF. “Recovery only” ST-segment depression and the predictive accuracy of the exercise test. Ann Intern Med 1990;112:11-6. Ribisl PM, Liu J, Mousa I, Herbert WG, Miranda CP, Froning JN, et al. A comparison of computer ST criteria for diagnosis of severe CAD. Am J Cardiol 1993;71:546-51. Berman, JA, Wynne J, Mellis G, Cohn PF. Improving diagnostic accuracy of the exercise test by combining R wave changes with duration of ST segment depression in a simplified index. Am Heart J 1983;105:60-6. Froelicher VF, Myers J, Follansbee WP, Labovitz AJ. Exercise and the heart. St. Louis: Mosby; 1993. p. 48-69. Herbert WG, Lehmann KG, Dubach P, Detrano R, Froelicher VF. Effect of beta blockade on the exercise ECG: ST level versus delta ST/HR index. Am Heart J 1991;122:993-1000. Miranda CP, Lehmann KG, Froelicher VF. Correlation between resting ST-depression, exercise testing coronary angiography, and longterm prognosis. Am Heart J 1991;122:1617-28.

BOUND VOLUMES AVAILABLE TO SUBSCRIBERS Bound volumes of American Heart Journal are available only to subscribers from the Publisher at a cost of $102.50 for domestic, $130.54 for Canadian, and $122.00 for international subscribers for Vol. 135 (January-June) and Vol. 136 (July-December), shipping charges included. Each bound volume contains subject and author indexes, and all advertising is removed. Copies are shipped within 60 days after publication of the last issue in the volume. The binding is durable buckram, with the Journal name, volume number, and year stamped in gold on the spine. Payment must accompany all orders. Contact Mosby, Inc., Subscription Services, 11830 Westline Industrial Dr., St. Louis, MO 63146-3318, USA; (800)453-4351, or (314)453-4351. Subscriptions must be in force to qualify. Bound volumes are not available in place of a regular Journal subscription.