Journal of Electrocardiology
Vol. 25 Supplement
Combination of Diagnostic Classifications From ECG and VCG Computer Interpretations
Jan H. van Bemmel, PhD, Jan A. Kors, PhD, and Gerard van Herpen, MD, PhD
Abstract: The Common Standards for Quantitative Electrocardiography
(CSE) study showed that the weighted combined diagnostic classification of a group of experts or a set of electrocardiographic (ECG) programs is superior to the average expert or program, and sometimes even better than the best expert. For that reason the authors investigated whether the combination of classifications from the authors’ programs for ECG and vectorcardiographic (VCG) interpretation would deliver better results than either one separately. The CSE diagnostic database (n= 1,220) was used for testing purposes. Since the combination of computer interpretations from the ECG and VCG requires a separate and preferably simultaneous recording of the VCG, the authors also examined the combined interpretation of the ECG with a simulated VCG reconstructed from the eight independent leads of the 12-lead ECG (the rVCG). Besides that, the authors investigated the combined interpretation from all single beats of the dominant waveform from the same ECG recording (sECG). The performance of all combinations, that is, the ECG + VCG, ECG + rVCG, and sECG proved to be significantly better (74.2%, 73.6%, and 71.2%, respectively) than that of the ECG or VCG separately (69.8% and 70.2%. respectively; p < 0.00 1 for all cases). However, the difference in performance between the sECG and the VCG was not significant. Key words: ECG, VCG, computer interpretation, evaluation, combined classification, CSE study.
The Common
Standards
for Quantitative
cardiography (CSE) ‘-’ was not assessing the performance of (ECG) and vectorcardiographic programs, but also stimulated
ECG processing. The centers cooperating in the CSE study fully realized the limitations of the assessment study; it is obvious that only a limited set of ECG signals (n = 250) in the first, signal-analysis part of the CSE study, and a limited set of ECGs (n = 1,220) in the second, diagnostic part of the CSE study, with a restricted collection of eight main diagnostic categories, could be investigated. For instance, neither the set of 250 ECG signals nor the diagnostic set of 1,220 ECGs contained much “natural” noise so that within the CSE study only a first impression could be obtained-with the help of artificially generated disturbances-of the infhrence of noise on wave detection and the estimation of onsets and endpoints
Electro-
only most helpful in electrocardiographic (VCG) interpretation
further
research
in
From the Department of Medical Informaties, Faculty ofMedicine and Health Sciences, Erasmus University, Rotterdam. The Netherlands. The CSE project was supported by the European Commission
within the frame of its Medical and Public Health Research Programmes, and by various funding agencies in member states of the European Community. Reprint requests: Jan H. van Bemmel, PhD, Department of Medical Informatics, mus University, lands.
Faculty of Medicine and Health Sciences, ErasP. 0. Box 1738, 3000 DR Rotterdam, The Nether-
126
Combination
of Electrocardiographic
of waves.6,7 The CSE party was also aware that because of the restricted size of the study, in majority, only single-disease cases could be used for validation. The influence of “natural” noise and signal variability on both ECG-signal processing and diagnostic classification could not be thoroughly investigated because of the exponentially growing validation effort and computer processing both in the participating centers and the CSE coordinating center. The same argument holds for the assessment of the performance of the programs for ECGs with different disease combinations, with many varying wave shapes, or for the evaluation of ECGs with arrhythmia patterns. Nevertheless, the CSE database does contain enough “natural” variability to be interesting for further examinations. Besides, the CSE study also triggered the question why, on average, programs and referees interpreting the ECG had a better performance than those interpreting the VCG, and why combinations of diagnostic classifications of either programs or referees had a better performance than either one separately. Since our own interpretation program MEANS (Modular ECG Interpretation System) consists of two independent versions, one for the 12-lead ECG and one for the Frank VCG,8,9 we formulated several questions: (1) would the combined diagnostic classifications of the ECG program and VCG program reveal better results than either one; (2) in this combined diagnostic classification, could the Frank VCG be replaced by a simulated VCG, reconstructed from the 12-leads, as investigated earlier”O,‘i; and (3) would the combined diagnostic classifications of single ECG beats (sECG) from one recording reveal a better diagnostic performance than the usual classification based on an average beat.
Materials
and Methods
Computer interpretations
l
van Bemmel et al.
127
In the CSE database, eight main categories were distinguished: 382 normal cases (also called “no significant abnormality,” NSA), 183 with left ventricular hypertrophy (LVH), 5 5 with right ventricular hypertrophy (RVH), 53 with biventricular hypertrophy (BVH), 170 with anterior myocardial infarction (AMI), 273 with inferior myocardial infarction (IMI), 73 with combined myocardial infarction (MIX), and 3 1 with myocardial infarction with manifest hypertrophy.
ECG and VCG Interpretation Both the ECG and VCG versions of MEANS9 were used in the investigations. In both versions, the signal-analysis modules essentially use the same algorithm to process the 12-lead ECG or VCG.’ Both classification parts use a heuristic approach by means of decision-tree logic. l2
Reconstructed VCG To use the interpretation results of both the ECG and VCG, the VCG has to be recorded in addition to the ECG, preferably simultaneously. To avoid this extra recording, the VCG was reconstructed from the simultaneously recorded eight independent ECG leads. This reconstructed VCG (rVCG) was subsequently interpreted by the VCG version of MEANS, and the classifications of the ECG + rVCG were combined in the same way as the ECG + the Frank VCG. The three Frank VCG leads were reconstructed through a linear weighted combination of the eight ECG leads. The reconstruction coefficients were obtained by multivariate regression.” Details on the diagnostic performance using reconstructed VCGs are previously reported. l1
Test Database
Single-beat Interpretation
The CSE diagnostic database5 was used to test the questions posed above. This database consists of 1,220 simultaneously recorded ECG and VCG recordings (sampling rate, 500 Hz; 8 or 10 seconds). All cases were validated by ECG-independent clinical evidence, such as echocardiography, enzyme levels, etc.,5 as well as by nine cardiologists (ECGs read by 8, VCGs read by 5 of the 9). For every case, the cardiologists’ interpretations were combined in the CSE coordinating center. These combined cardiologists’ results served as another reference set in this study.
To reduce the influence of noise, an average P-QRS-T complex is usually computed from the dominant complexes in an ECG recording. However, this does not do justice to the variability that may be “naturally” present in the ECG.i3,14 An alternative approach is, therefore, to interpret each beat in a recording separately by computing a set of measurements for each complex in the recording and by classifying each individual complex. l 5 The classifications of all complexes are then combined in one final classification; thus, all 9,833 single ECG complexes con-
128
Journal of Electrocardiology
Vol. 25 Supplement
tamed in the CSE database were interpreted separately. This means that each single beat belonging to the so-called “dominant category,” that is, after the wave typing, was processed in the same way MEANS usually processes its averaged dominant P-QRS-T complex9-by the modules for wave recognition, parameter measurement, and diagnostic classification. In contrast to the averaged dominant beat, in single beats the “natural” noise is still fully present.
CSE Diagnostic Coding Scheme The statements produced by the ECG and VCG classification parts of MEANS were rendered into diagnostic codes according to the CSE coding scheme.’ A code comprises one of the eight main diagnostic categories, corresponding to the diagnostic groups in the diagnostic database and a qualifier (definite, probable, or possible) .2*5 When a program reveals that no major pathological category is present, but only norunajor abnormalities such as ST-T changes, the CSE rules prescribe mapping to the no significant abnormality category.
Combination of Diagnostic Results The same method used in the CSE project to combine classifications from different observers or different programs was applied in this study to combine results from the ECG and VCG programs, the ECG and rVCG programs, and the results from the singlebeat interpretations. Similar to the CSE study, the qualifiers were assigned weights corresponding to the certainty levels: 3 points for “definite,” 2 for “probable,” and 1 for “possible.” The combined result was then obtained by adding the weights and dividing by their number. The resulting qualifier, lying between 0 and 3, is rounded off. In the CSE Coordinating Center these results (ie, for the combined ECG + VCG, combined ECG + rVCG, and combined sECG) were compared with the “clinical evidence” and with the “combined cardiologist.”
This was performed in the same way as in the diagnostic CSE study.5
Results The results of all interpretations by MEANS, that is, for the ECG, VCG, rVCG, ECG + VCG, ECG + rVCG, and sECG were compared with both the “clinical evidence” and the “combined cardiologist.” The 1,220 ECG recordings in the CSE database contained 9,833 singular complexes, which were used in the computation of the single-beat interpretation and resulted, after combination, in another set of 1,220 combined interpretations. Thus, in total, 13,493 complexes (3 x 1,220 averaged beats for the ECG, VCG, and rVCG, and 9,833 for the sECG) were interpreted. For each comparison, a classification matrix was computed for the eight main CSE categories. From these 8 X 8 matrices, 3 X 3 matrices were constructed for the categories no significant abnormality, hypertrophy (left ventricular, right ventricular, and biventricular), and myocardial infarction (anterior, inferior, mixed, and infarction plus hypertrophy). The results of these classifications and combinations are shown in Table 1. The specificity (ie, the “sensitivity for no significant abnormality”) for the ECG is 97.1%, 86.8% for the VCG, and 94.0% for the rVCG. The combination of the ECG + VCG has a specificity of 91.6% (p < 0.001 when compared with the ECG; Wilcoxon test), and the ECG + rVCG has a specificity of 94.4% (p = 0.003). The specificity of the sECG interpretation (97.4%) is higher than that of any other program or combination. The sensitivities for hypertrophy and myocardial infarction for the ECG + VCG, ECG + rVCG, and sECG are significantly higher than those for the ECG (Table 1; p < 0.001 in all cases). The comparisons with the “combined cardiologist” as the reference show the same pattern as those with the “clinical evidence” (Table 2). The specificities of the ECG + VCG, ECG + rVCG, and sECG interpretations drop slightly with respect to that of the ECG program, while the sensi-
Table 1. Sensitivities (%) for the Different Programs and Combinations Against the “Clinical Evidence”
NSA HYPER MI TOT
ECG
VCG
rVCG
ECG + VCG
ECG + rVCG
sECG
n
97.1 42.5 67.2 69.8
86.8 45.8 76.0 70.2
94.0 46.3 70.2 70.5
91.6 49.1 77.9 74.2
94.4 49.7 74.4 73.6
97.4 44.8 69.0 71.2
382 291 547
NSA = no structural abnormality; HYPER = hypertrophy; based on “clinical evidence.”
MI = myocardial infarction; TOT = total accuracy; n = number of cases
Combination Table 2. Sensitivities
of Electrocardiographic
(%) for the Different
Computer Interpretations
Programs and Combinations
Against
l
van Bemmel et al.
the “Combined
129
Cardiologist”
ECG
VCG
rVCG
ECG + VCG
ECG + rVCG
sECG
n*
NSA HYPER
96.8 63.1
84.8
91.2
91.0
93.1
66.3
65.8
72.5
72.5
MI TOT
79.1
88.8
82.8
91.7
87.6
97.5 66.4 81.7
503 203.5 481.5
80.3
78.1
79.0
84.1
83.3
81.8
* 32 cases were classified ity; HYPER = hypertrophy; result.
tivities
as “other” by the “combined MI = myocardial infarction;
for hypertrophy
and
cardiologist” and have not been considered here. NSA = no structural abnormalTOT = total accuracy; n = number of cases based on the “combined cardiologist”
infarction
myocardial
improve. As in the CSE study, total accuracies were also computed from the full 8 x 8 confusion matrices, taking either the “clinical evidence” or the “combined cardiologist” as the reference (Tables 1 and 2). The ECG + VCG proved to perform significantly better than each program separately (total accuracy, 74.2%; ECG + VCG) versus 69.8% (ECG) and 70.2% (VCG; p < 0.001 in both cases). The results of the rVCG (total accuracy, 70.5%) are comparable with those of the ECG and the VCG (p > 0.10 in both cases). The performance of the ECG + rVCG (total accuracy, 73.6%) is approximately the same as that of the ECG + VCG (p > 0.10). The total accuracy of the sECG was 71.2% as compared with the ECG (p < 0.001). These total accuracies were entered in the scatter plot of Figure 1 together with those of the
a5 2 rg $ 80 $ Ij _g 75 Tz .E 3 ;70 3 2
.
39
i. .
j.
:
, .
..
Q
:. . i
.I
*......
65 .. . m;.; 60
.~.+/ .;
..I 4
..,,... 65
..i
70 total accuracy
‘
.
..* ,...,
75
against
ECG+rVCG
:. A
sECG
0
other programs
l
cardiO,Ogj*
: 80
‘clinical evidence’
a5
90
(%)
Fig. 1. Total accuracies of the
individual (indicated by ECG, VCG, and rVCG) and combined (ECG + VCG, ECG + rVCG, and sECG) MEANS interpretation programs. Total accuracies of the other programs and of the cardiologists participating in the CSE study are also plotted. From these cardiologists the “combined cardiologist” result was derived. All interpretation results are compared with the “clinical evidence” (horizontal) and the “combined cardiologist” (vertical).
individual cardiologists from whom the “combined cardiologist” reference was derived and the other interpretation programs and referees that participated in the CSE study.5
Discussion These studies confirm the rationale to combine the classification results from different programs applied to the same ECG data; a combined result being better than any separate classification. This holds for the ECG + VCG (or rVCG) and for the sECG (Tables 1 and 2, Fig. 1). We have shown that it is not necessary to record a separate VCG since the rVCG performs as well as the original Frank VCG. In this way, it is possible to gain in sensitivity without losing too much in specificity, even without developing totally new classification modules in MEANS. For that reason, also the total accuracy, compared with either the “clinical evidence” or the “combined cardiologist,” improved. Of course, this total accuracy depends on the composition of the database. In the CSE database, about one third belongs to the no significant abnormality category. If the composition of the database would contain a higher percentage of no significant abnormality cases, the total accuracy of the sECG (or ECG) interpretation would at a certain point exceed that of any of the combined interpretations, as the sECG (or ECG) program shows the highest specificity. If the no significant abnormality ratio remains under 62% (7 l%), while the percentages of the other categories are adjusted in proportion to the percentages in the CSE database, the total accuracy of the ECG + VCG (ECG + rVCG) interpretation still remains higher than that of the sECG (or ECG) interpretation.
Acknowledgments The authors are grateful to Professor J. L. Willems (coordinator of CSE) and R. Reniers (both at the Catholic University Leuven, Belgium) for the validation of these results.
130
Journal of Electrocardiotogy
Vol. 25 Supplement
References 1. Willems JL, Amaud P, van Bemmel JH et al: Common standards for quantitative electrocardiography: goals and main results. Meth Inf Med 29:263, 1990 2. Willems JL, Abreu-Lima C, Arnaud P et al: Effect of combining electrocardiographic interpretation results on diagnostic accuracy. Eur Heart J 10: 1348, 1988 3. Willems JL, Abreu-Lima C, Arnaud P et al: Evaluation of ECG interpretation results obtained by computer and cardiologists. Meth Inf Med 29:308, 1990 4. Willems JL, Amaud P, van Bemmel JH et al: Assessment of the performance of electrocardiographic computer programs with the use of a reference data base. Circulation 71:523, 1985 5. Willems JL, Abreu-Lima C, Arnaud P et al: The diagnostic performance of computer programs for the interpretation of electrocardiograms. N Engl J Med 325: 1767, 1991 6. Willems JL, Zywietz C, Amaud P et al: Influence of noise on wave boundary recognition by ECG measurement programs. Comp Biomed Res 20:543, 1987 7. Zywietz C, Willems JL, Amaud P et al: Stability of computer ECG amplitude measurements in the presence of noise. Comp Biomed Res 23: 10, 1990 8. Kors JA, Talmon JL, van Bemmel JH: Multilead ECG analysis. Comp Biomed Res 19:28, 1988
9. van Bemmel JH, Kors JA, van Herpen G: Methodology of the modular ECG analysis system MEANS. Meth Inf Med 29:346, 1990 10. Kors JA, van Herpen G, Sittig AC, van Bemmel JH: Reconstruction of the Frank vectorcardiogram from standard electrocardiographic leads: diagnostic comparison of different methods. Eur Heart J 11: 1083, 1990 11. Kors JA, van Herpen G, Willems JL, van Bemmel JH: Improvement of automated electrocardiographic diagnosis by combination of computer interpretations of the electrocardiogram and vectorcardiogram. Am J Cardiol 70:96, 1992 12. Kors JA, Kamp DM, Snoeck Henckemans DP, van Bemmel JH: DTL: a language to assist cardiologists in improving classification algorithms. Comp Meth Progr Biomed 35:93, 1991 13. Fischmamr E, Cosma J, Pipberger HV: Beat to beat and observer variation of the electrocardiogram. Am Heart J 75:465, 1968 14. Borovsky D, Zywietz C: Accuracy and beat-to-beat variation in ECG computer measurements. Adv Cardiol 16:176, 1976 15. Kors JA, van Herpen G, van Bemmel JH: Variability in ECG computer interpretation: analysis of individual complexes vs analysis of a representative complex. J Electrocardiol 25:263, 1992