MAY 1975
The American
Journal
of CARDIOLOGY VOLUME NUMBER
@ 35 5
CLINICAL STUDIES
Clinical Application of a Second Generation Electrocardiographic Computer Program
HUBERT
V.
DONALD
McCAUGHAN,
DAVID HANNA JEROME
PIPBERGER,
LITTMANN, MD, A. PIPBERGER, CORNFIELD,
MD, FACC EA
BS
ROSALIE A. DUNN, PhD CHARLES D. BATCHLOR, ALAN S. BERSON, PhD Washington,
FACC
MD
MS
DC.
West Roxbwy,
Massachusetts
Boston, Massachusetts
From the Veterans Administration Research Center for Cardiovascular Data Processing, Veterans Administration Hospital, Washington, D. C.. and the Veterans Administration Hospital, West Roxbury, Mass.; and the Departments of Clinical Engineering, Medicine, and Statistics, George Washington University, Washington, D. C., and the Department of Medicine, Harvard UniversS, Boston, Mass. This study was supported in part by Research Grants HL 15047 and HL 15191 from the National Heart and Lung Institute. National Institutes of Health. Bethesda, Md. Manuscript accepted October 16. 1974. Address for reprints: Hubert V. Pipberger, MD, Veterans Administration Hospital, 50 Irving St., N. W., Washington, D. C. 20422.
An electrocardiographic computer program based on multivariate analysis of orthogonal leads (Frank) was applied to records transmitted daily by telephone from the Veterans Administration Hospital, West Roxbury, Mass., to the Veterans Administration Hospital, Washington, D. C. A Bayesian classification procedure was used to compute probabilities for all diagnostic categories that might be encountered in a given record. Computer results were compared with interpretations of conventional 12 lead tracings. Of 1,663 records transmitted, 1,192 were selected for the study because the clinical diagnosis in these cases could be firmly established on the basis of independent, nonelectrocardiographic information. Twenty-one percent of the records were obtained from patients without evidence of cardiac disease and 79 percent from patients with various cardiovascular illnesses. Diagnostic electrocardiographic classifications were considered correct when in agreement with documented clinical diagnoses. Of the total sample of 1,192 recordings, 66 percent were classified correctly by computer as compared with 66 percent by conventional 12 lead electrocardiographic analysis. Improvement in diagnostic recognition by computer was most striking in patients with hypertensive cardiovascular disease or chronic obstructive lung disease. The multivariate classification scheme functioned most efficiently when a problem-oriented approach to diagnosis was simulated. This was accomplished by a simple method of adjusting prior probabilities according to the diagnostic problem under consideration.
The first attempts to automate analysis of electrocardiograms by computer were made in 1957 at our laboratory,1*2 and a computer program to separate normal from abnormal records became operational in 1959.3 In succeeding years, several other investigators4-l1 developed alternative methods for automated analysis of electrocardiograms. Comparison of the various approaches disclosed some fundamental issues. Are existing methods of electrocardiographic analysis as practiced by the cardiologist satisfactory? Should they be automated? Or should the capability of the computer to perform more complex analytical procedures be explored in the hope of improving the methods currently used in clinical practice? The common denominator of these new techniques is multivariate analysis. A large num-
May 1975
The American Journal of CARDIOLOGY
Volume 35
597
ELECTROCARDIOGRAPHIC
COMPUTER PROGRAM-PIPBERGER
ET AL
ber of electrocardiographic measurements can be used simultaneously with such procedures, usually in the form of multidimensional vectors. Obviously, the human interpreter of electrocardiograms is unable to perform such complex computations within a reasonable time, and it is here that the capacity of the computer to perform complex operations comes into play and the machine becomes truly an extension of human capabilities. As intriguing as this may sound, transforming such ideas into reality is beset with many pitfalls which will be discussed here. Classification
of Electrocardiographic Programs
Computer
First Generation Computer Programs Our classification of existing electrocardiographic computer programs into first and second generation programs has been described previously.12 To develop the proper framework for classification of such programs, we repeat some of the salient features here. It has become customary in discussing computer hardware to talk about first, second and third generation computers in ascending order of complexity and sophistication. It appeared useful to us to classify computer programs for electrocardiographic analysis into a first and second generation. In the first, the cardiologist’s methods of analyzing electrocardiograms are simulated and, in an ideal case, the programs perform as well as their designers. In second generation programs, multivariate classification techniques are applied with the aim of improving present methods of analysis and thereby reducing the number of misclassifications or erroneous diagnoses. Practically all first generation programs5-gJ3214 can be compared with decision trees. When certain electrocardiographic measurements exceed the limits of
normal, the record is labeled abnormal. A specific diagnosis may be attached to it if the deviation from normal is characteristic of a given electrocardiographic abnormality such as myocardial infarct. A typical example of such a decision tree is shown in Figure 1. First generation programs may differ in number and types of electrocardiographic measurements used, but their general design is similar. They have been widely used in the past, but a number of major difficulties were encountered. Since there are no generally accepted diagnostic criteria for electrocardiographic analysis, none of these programs found general acceptance. Frequently, relatively large numbers of electrocardiographic measurements were included in the programs, which led to overdiagnosis and the labeling of excessive numbers of normal tracings as abnormal.i5 The seriousness of “heart disease of electrocardiographic origin”16 cannot be overemphasized, and one can only hope that computers will not contribute further to this disease. Evaluations of first generation programs have usually been on comparisons with readings of cardiologists who were also the program designers. If the same diagnostic rules are used, agreement should be, in the ideal case, 100 percent. Any discrepancies in interpretation have to be ascribed to technical deficiencies in the computer program or its operation, or both. More reliable performance evaluations can be expected only when the tracings used are from patients whose diagnoses may be established by independent methods such as cardiac catheterization, angiocardiography, direct inspection during cardiac surgery, autopsy or other objective means. For first generation programs, one should not be too optimistic about the outcome of such evaluations. When a selected group of electrocardiogranhers were tested in a study by kimonson et a1.,17 they achieved correct
FIGURE 1. Typical example for decision-tree logic used in all first generation programs. If the Q/R ratio or Q amplitude In lead 2 exceeds the limits of normal, anterior myocardlal infarction (AMI), pulmonary emphysema (PE) and left ventricular hypertrophy (LVH) need to be considered. The additional measurements are to dlfferentiate between these three diagnostic entities. LMI = lateral myocardial infarction; PDMI = posterodiaphragmatic myocardlal infarction; RVH = right ventricular hypertrophy.
590
May 1975
The American Journal of CARDIOLOGY
Volume 35
ELECTROCARDIOGRAPHIC
interpretations in only 54 percent of cases. Programming the criteria of electrocardiographic experts can lead, at best, to the same results.
TABLE
Second Generation
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
Computer
Programs
Second generation electrocardiographic computer programs were not necessarily developed after first generation programs. Rather, the term denotes degree of complexity. As mentioned earlier, second generation programs are mostly of the multivariate type in which large numbers of electrocardiographic measurements are used simultaneously for classification. Almost without exception, early results based on such techniques were remarkably good. Most often, this was due to a disparity between the number of electrocardiographic measurements used and the number of tracings available for study. As pointed out by Cornfield, l8 the number of variables used in multivariate analysis should not exceed 1/20th the number of available recordings if repeatable results are to be expected. Recognizing early the need for a large electrocardiographic data base, the Veterans Administration organized a cooperative study in 1960 with eight hospitals participating. More than 28,000 electrocardiographic records, together with complete clinical protocols, are now available in this data pool. Of a large number of different multivariate techniques tested, computation of posterior probabilities for each diagnostic entity under consideration was found most satisfactory. This procedure, which is applied to each patient record, has been described in it involves the detail previously. 12~1g,20Essentially, use of pair-wise discriminant functions and prior probabilities for each diagnostic entity. The resultant posterior probabilities indicate the likelihood that an electrocardiographic abnormality will be present in a given record. Most of our early studies on diagnostic electrocardiographic classification by multivariate analysis21-25 dealt with only two diagnostic categories at a time, such as normal versus myocardial infarct, normal versus left ventricular hypertrophy or normal versus biventricular hypertrophy. Such procedures are not very realistic since, in clinical practice, more than two diagnostic entities generally need to be considered and, in many cases, all or most possible diagnoses have to be taken into account. For analysis of the QRS complex, we now consider 12 different diagnostic possibilities (Table I). Interpretation of P wave, S-T segment and T wave changes requires classification into many additional categories.26-28 An example of the diagnostic computer output format for an individual patient is shown in Table II. The multigroup classification technique was first developed with a training sample of 2,602 electrocardiograms1gy20 and it was applied later to a test sample of 1,584 tracings.i2 Results differed by only 2 percent, indicating a satisfactory degree of repeatability and stability of the method. For this study a new approach to system evalua-
COMPUTER PROGRAM-PIPBERGER
ET AL.
I
Diagnostic Analysis
Categories
Considered
in Computer
of QRS Complex
Normal Anterior myocardial infarct Posterodiaphragmatic myocardial infarct Lateral myocardial infarct Left ventricular hypertrophy Right ventricular hypertrophy Biventricular hypertrophy Chronic obstructive lung disease Left ventricular conduction defect Left ventricular conduction defect with myocardial infarct Right ventricular conduction defect Right ventricular conduction defect with myocardial infarct
TABLE
II
Abstract from Typical Computer Program*
Printout
of Electrocardiographic
ID 07094107 Date l/31/73 Race Black Age 50 Heart rate 63/min Sinus rhythm, multiple premature ventricular contractions First degree A-V block OR5 diagnostic
probabilities
Left ventricular hypertrophy Compatible with biventricular hypertrophy
87
P wave probabilities Left atrial overload Normal Right atrial overload
58 31 11
Digitalis effect * Measurements of trme intervals and amplitudes have been omitted. The record was obtained from a patient known to have had hypertensive cardiovascular disease for approximately 10 years and admitted for congestive heart failure. The probability of 87 for left ventricular hypertrophy is relatively high. There is also evidence of biventricular hypertrophy, which is frequently found after episodes of congestive heart failure. P wave analysis indicated a medium probability for left atrial overload. S-T configuration suggested digitalis effect.
tion was chosen that appears to be more realistic and closer to routine clinical electrocardiographic practice. Records were transmitted daily by telephone from the Veterans Administration Hospital in West Roxbury, Mass., to the ECG Data Processing Center in Washington, D. C. A total of 1,663 tracings were transmitted, and a firm diagnosis could be established on the basis of independent, that is, nonelectrocardiographic, evidence in 1,192 cases. This report deals with the evaluation of diagnostic accuracy of the computer reports in these latter cases. To exclude bias, use of electrocardiographic information for arriving at the patients’ diagnoses was avoided. In ad-
May 1975
The American Journal of CARDIOLOGY
Volume 35
599
ELECTROCARDIOGRAPHIC
TABLE
COMPUTER
PROGRAM-PIPBERGER
ET AL.
III
Seven Sets of Prior Probabilities
Chosen According to Tentative Diagnoses of Patients* Myocardial Infarct
N
Anterior
0.760 0.300 0.300 0.300 0.400 0.250 0.760
0.037 0.254 0.070 0.012 0.023 0.070 0.037
Posterodiaphragmatic
Lateral
Left
Right
Chronic Obstructive Lung Disease
0.008 0.050 0.015 0.002 0.001 0.015 0.008
0.080 0.150 0.539 0.420 0.050 0.594 0.080
0.040 0.001 0.001 0.250 0.202 0.005 0.040
0.040 0.010 0.010 0.005 0.302 0.001 0.040
Ventricular Hypertrophy
1. 2. 3. 4. 5. 6. 7.
Normal Coronary artery disease Hypertensive cardiovascular disease Valvular and/or congenital heart disease Pulmonary disease Primary myocardial disease Other diseases, not related to the cardiovascular system
0.035 0.235 0.065 0.011 0.022 0.065 0.035
* Selection of set 2, for instance, indicates only that myocardial infarct is under consideration. For patients with pulmonary (set 5), on the other hand, right ventricular hypertrophy and chronic obstructive lung disease are most likely to be present, prior probabilities for these entities are raised accordingly. These sets may be considered problem-oriented prior probabilities.
dition, interpretations of conventional 12 lead electrocardiograms of the patients were tested for accuracy against the same independent diagnostic evidence and compared with computer results. Computer analysis of cardiac rhythm was not included in this report. Evaluation of the part of the computer program dealing with arrhythmias will be reported separately.
Materials
and Methods
Computer Program Frank lead electrocardiograms were recorded on frequency modulation (FM) tape. Methods of electrocardiographic data acquisition have been described in detail previously. 2g The data were transmitted daily from the Veterans Administration Hospital in West Roxbury, Mass., to the ECG Data Processing Center at the Veterans Administration Hospital in Washington, D. C. A Bell Telephone transmission unit (model 604) was used for simultaneous transmission of the three orthogonal leads in FM form.30 A leased telephone line was available most of the time, but a regular dial telephone was also tested for a limited period. The band width of the FM transmission system extended up to 105 hertz. After the FM signals were demodulated, they were converted into digital form using a sampling rate of 500/set for each lead. Details of the electrocardiographic computer analysis, using a Control Data Corporation 3200 computer, have been reported.1gs20 The computation of posterior probabilities for the diagnostic entities under consideration was performed on each
patient record as follows: Let z denote a k-dimensional vector of electrocardiographic measurements for patients; m, the number of possible disease categories; and i, a particular disease group from the m possibilities. (For the first seven disease categories of Table I, k = 66.) If f(xli) denotes the conditional probabilities (or probability density) of x for those in disease categories i and gi, the unconditional or prior probability of being in disease category i, then the posterior probability of being in category i, given the measurement vector X, is by Bayes’ theorem:
600
May 1975
The American Journal of CARDIOLOGY
Volume 35
disease and the
P(ilx) = [ ~f.#$]-’ Posterior probabilities were computed for all diagnostic entities listed in Table I and in addition for P wave and ST-T abnormalities. This report will be limited to those indicated in the table. The accuracy of cardiac rhythm statements will be evaluated separately. Table II shows a typical sample of the computer output resulting from this type of electrocardiographic analysis. The use of prior probabilities in the diagnostic computer program exerts a strong influence on results, as will be discussed later. Seven sets of such prior probabilities, which have been found quite efficient, are shown in Table III. In a given case, a set is chosen according to the patient’s tentative diagnosis, as indicated on the request form for an electrocardiogram. For instance, if a patient is admitted with the chief complaint of chest pain, set 2 (coronary artery disease) is chosen, which indicates a prior probability for myocardial infarction of approximately 50 percent. Similarly, in a patient with increased blood pressure, selection of set 3 (hypertensive cardiovascular disease) indicates that the prior probability that the patient will have left ventricular hypertrophy is also approximately 50 percent. Similar considerations led to the development of the remaining sets, which were found simple and practical for daily use since the physician requesting an electrocardiogram had only to check one or more of the broad diagnostic categories under consideration (Table IV). When more than one category was checked, combination sets of prior probabilities were applied. The various sets listed were determined on an empirical basis with the goal of keeping misclassifications at a minimum. Clinical
Diagnostic
Categories
As mentioned earlier, this report deals with 1,192 tracings selected from a larger data pool of 1,663 patient records. This selection was based on diagnostic criteria developed within the Veterans Administration Cooperative Study on Cardiovascular Data Processing. Cases were excluded only when documentation for the clinical diagnosis was incomplete or lacking. Although criteria for the various diagnostic categories have previously been described in
ELECTROCARDIOGRAPHIC
TABLE
TABLE
IV
Section of Electrocardiographic
0 0
q
1. Normal cardiovascular status 2. Coronary artery disease 3. Hypertensive cardiovascular disease 4. Valvular or congenital heart disease
0
5. Pulmonary
in the Study Cases
disease
disease 7. Other diseases not related to cardiovascular system
* The form is routinely filled out by the patient’s (More than one disease category can be checked.)
Included
by
q 6. Primary myocardial 0
ET AL.
V
Clinical Diagnoses of Patients
Request Form*
CARDIAC STATUS (Tentative or Final Diagnosis) “No ECG will be taken unless this section is completed requesting physician.” 0
COMPUTER PROGRAM-PIPBERGER
1. Normal 2. Myocardial infarct 3. Hypertensive cardiovascular disease 4. Valvular heart disease (congenital or acquired) 5. Chronic obstructive lung disease 6. Combinations of 2, 3, 4, or 5 7. Disease not related to the cardiovascular system
IlO.
%
173 478 155 199
15 40 13 17
77 37 73
6 3 6
1,192
100
physician.
more detai1,21-26 the main characteristics of the various groups may be repeated here for clarity. The number of cases in each diagnostic category is given in Table V. The most frequent diagnosis was myocardial infarct (40 percent). In the great majority of cases the infarct was old. Criteria for myocardial infarct included a typical history of a prolonged episode of crushing substernal chest pain together with typically elevated serum enzyme levels.24 Patients’ accounts of past “heart attacks” were accepted only when adequately documented in the medical charts of previous admissions. Patients with a diagnosis of hypertensive cardiovascular disease had a sustained blood pressure elevation of at least 150/90 mm Hg or more, as determined in the sitting position four times a day during a 3 day period after the day of admission.25 For this group the assumption was made that sustained blood pressure elevation leads to left ventricular hypertrophy, and when electrocardiograms were classified as manifesting left ventricular hypertrophy, the classification was considered correct. Of the 199 patients with valvular heart disease, 128 (64 percent) had undergone cardiac catheterization. In the remaining 71 patients, physical findings obtained by noninvasive means permitted the diagnosis to be established beyond reasonable doubt. Seventy-seven patients were diagnosed as having chronic obstructive lung disease. Criteria for this diagnosis were the same as those described in a previous study.23 All patients had a history of excessive cough, sputum production and dyspnea on exertion. More than 50 percent also had shortness of breath at rest. Exercise tolerance was reduced in the great majority of cases, and a few patients were incapacitated. Twenty-one percent (246) of the patients were considered free of cardiovascular disease by history, physical examination, chest X-ray examination and laboratory findings (groups 1 and 7, Table IV). They had been admitted to the hospital for reasons other than heart disease. All patients had conventional 12 lead and orthogonal lead (Frank) electrocardiograms. The records were interpreted by two of us (D. L. and D. McC.) without knowledge of the computer interpretation. The criteria for interpretation of the 12 lead electrocardiogram were essentially those described by Littmann. 31 Since the great majority of the patients had been admitted to the medical service of the hospital, their tentative or final diagnoses were known to the interpreters of the electrocardiograms. In all other
Total
TABLE VI Comparison of Computer Analysis of Conventional Series of 1,192Patients
Classification Results with Results of 12 Lead Electrocardiograms in Total
ECGDiagnosis
Computer analysis (3 lead ECG) Physicians’ interpretation (12 lead ECG)
Correct
Partially Correct
Incorrect
86%
5%
9%
68%
4%
28%
cases, this prior knowledge was limited to the information provided on the request form for an electrocardiogram. Results of the visual electrocardiographic analysis and computer interpretation were later transferred to evaluation forms. The completed medical records were available at that time for documentation of the final diagnosis. At certain times, electrocardiographic telephone transmission suffered from various amounts of noise interference. Part of this problem was due to the use of the Bell Telephone transmission unit (model 604), which was still considered experimental by company officials at the time of the study. Since telephone noise problems are not uncommon in electrocardiographic data transmission, a digital filter with a cut-off frequency at 40 hertz was applied to all records. Computer analyses of filtered and unfiltered records were compared to determine the effect of this relatively strong filter on systems performance.
Results Computer Analysis vs. Standard Electrocardiographic Interpretation Results of the computer analyses and the 12 lead electrocardiographic interpretations are summarized in Table VI. Of the total of 1,192 records, 1,026 (86 percent) were classified correctly by computer. In an additional 56 tracings (5 percent), the automated analysis was partially correct. One hundred tgn (9 percent) of the records were misclassified. (Rules for
May 1975
The American Journal of CARDIOLOGY
Volume 35
601
ELECTROCARDIOGRAPHIC
COMPUTER PROGRAM-PIPBERGER
ET AL.
TABLE VII Breakdown
of Classification
Results in 1,092 Cases Without Ventricular
Conduction
Defects (QRS duration
< 0.122 second)*
ECGInterpretation Correct
Diagnostic Entity Normal Myocardial infarct Hypertensive cardiovascular disease Valvular heart disease Chronic obstructive lung disease Disease not related to cardiovascular system Combination of more than one of above categories Total * For results
3 lead
12 lead
3 lead
12 lead
3 lead
12 lead
(no.)
(%I
(%)
(%)
(%)
(%)
(%)
173 418 146 171 75 73 36
98 85 84 96 81 91 22
93 79 26 68 29 92 14
1 3 9 1 3 1 75
0 0 1 1 0 0 61
1 12 7 3 16 8 4
7 21 73 31 71 8 25
87
68
5
2
8
30
see test.
TABLE VIII Comparison of Computer Classification Sets of Prior Probabilities were Used*
Results when Different
ECGDiagnosis
Prior Probabilities Equalized for all diagnostic categories Adjusted according to number of records in each diagnostic category As shown in Table VII
Correct
Partially Correct
Incorrect
66%
8%
26%
69%
9%
22%
87%
5%
8%
* Poorest results were obtained when prior probabilities were equalized for all diagnostic categories. When adjusted according to sample sizes, a slight improvement was achieved. Best results were obtained when sets of problem-oriented prior probabilities, asshown in Table Ill, were used.
are given in the appendix to this report.) Results of the 12 lead interpretations were 68 percent, 4 percent and 28 percent for correct, partially correct and incorrect, respectively. Thus, an 18 percent increase in correct classifications could be achieved through application of a multivariate analysis program. At the same time, the number of misclassifications could be reduced from 28 to 9 percent. Results according to clinical diagnosis: A breakdown of results according to clinical diagnosis of patients is given in Table VII. When findings in normal subjects and all patients admitted for reasons other than cardiovascular disease were combined, tracings were misclassified as abnormal in only 8 (3 percent) of the 246 subjects. This corresponds to a specificity rate of 97 percent for multigroup computer analysis. Results for 12 lead electrocardiographic readings were only slightly lower, with 92 percent correct classifications. classification
602
May 1975
Incorrect
Cases
1,092
in cases with QRS prolongation,
Partially Correct
The American Journal of CARDIOLOGY
Volume 35
Improvement in the recognition rate for abnormalities was least for myocardial infarcts with 85 percent diagnosed correctly by computer compared with 79 percent by conventional analysis. Records from patients with valvular heart disease were classified correctly by computer in 96 percent of cases, as compared with 68 percent by 12 lead readings, representing an improvement of 28 percent. The most striking differences between automated and conventional analysis were found in patients with hypertensive cardiovascular disease and chronic obstructive lung disease. In both groups, computer results exceeded those obtained from 12 lead electrocardiograms by more than 50 percent. This rate of improvement in diagnostic recognition corresponds to that previously reported for the same two entities.23s25 When several disease entities were considered simultaneously, diagnostic recognition rates dropped sharply for both computer and conventional analysis. If at least one of two abnormalities was diagnosed correctly, the classification was rated “partially correct.” If completely and partially correct answers are combined, the diagnostic recognition rate was 91 percent by computer and 72 percent by conventional analysis. Effect of prior probabilities: As mentioned earlier, the various sets of prior probabilities listed in Table III were used for the analysis of records listed in Tables VI and VII. In this analysis an implicit assumption is made that the correct disease categories were checked on the electrocardiographic request form, leading automatically to the proper choice of prior probabilities. Errors in selecting the correct set were relatively infrequent because the disease categories were kept intentionally very broad. Tentative diagnoses governing this choice do not need to be precise for this purpose. Presence of a heart murmur, for instance, without any further diagnostic specification indicates that “valvular and/or congenital heart disease” has to be considered (group 4, Table IV).
ELECTROCARDIOGRAPHIC
Since errors cannot be completely ruled out, we investigated further the effect of prior probabilities. Two experiments were performed on the 1,092 records without evidence of ventricular conduction defects; the results are listed in Table VIII. First, diagnostic classification was repeated with prior probabilities set equal for all diagnostic groups. For practical removed the effect of purposes, this experiment knowing the patient’s tentative clinical diagnosis. It led to a sharp drop in correct classification from the original 87 percent to 66 percent. From a statistical standpoint, better results should be expected when prior probabilities are adjusted according to the number of patients in each category. This was accomplished by using the sample sizes given in Table V as prior probabilities. As expected, the rate of correct classifications increased, but the result of 69 percent was still substantially below the 87 percent in the first classification when more appropriate prior probabilities were used. Of the total of 1,192 records, 100 were classified as showing ventricular conduction defects, based on a QRS duration of more than 0.122 second. Although the normal limit for simultaneously recorded leads had been found to be 0.112 second,32 the limit was extended by 10 msec because ventricular hypertrophy may cause a certain amount of QRS prolongation2s without representing a true conduction defect of the bundle branch block type. Of the 100 records, 75 were interpreted as showing left ventricular conduction defects and 25 as showing right ventricular conduction defects. The number of cases with documented myocardial infarction was 47 in the former group and 13 in the latter group. By computer analysis, 32 (69 percent of the infarct cases) were correctly identified in the presence of left ventricular conduction defects. The corresponding figure for right ventricular conduction defects with myocardial infarction was 7 (52 percent). The level of specificity was 86 and 83 percent, respectively, for the two classifications. Effect of Dlgltal Fllterlng
The level of noise interference was measured in all records in terms of root mean square (RMS) noise. In 78 percent of cases, the RMS noise level was below 0.02 mv, representing a noise band width of approximately 0.08 mv, or less than 1 mm on ordinary electrocardiograms with a calibration of 1 mv = 10 mm. In an additional 14 percent of the records, the RMS noise level was between 0.02 and 0.04 mv, which may still be considered marginally acceptable under clinical conditions. In the remaining 8 percent, the RMS noise level exceeded 0.04 mv, extending in a few well above 0.1 mv. Typical examples of noisy records due to telephone transmission are shown in Figures 2A and 3A. Figures 2B and 3B indicate the effect of a digital filter. In the first case, QRS complexes can hardly be recognized in leads X and Y. After application of the filter, wave forms become sufficiently distinct to determine
COMPUTER PROGRAM-PIPBERGER
ET AL.
beginning and end of P, QRS and T waves by automatic wave recognitipn. Besides causing loss of accuracy in amplitude measurements, excessive noise affects mainly measurements of wave durations and time intervals. Thus, 30 noisy records with normal QRS duration were erroneously interpreted by computer before filtering as demonstrating left or right ventricular conduction defects. Eighteen records were completely rejected by the computer because of extremely high noise. After filtering, errors could be corrected, and all records were processed successfully. As expected with changes in electrocardiographic measurements after filtering, diagnostic classification also changed in some instances. Including the 30 records with misclassified conduction defects, 72 yielded a different diagnosis after filtering. In 59 records (5 percent) this change led to improvement and in 13 (1 percent) to misclassification. (The term “improvement” is used, as in all other classifications, to indicate agreement with the patient’s true diagnosis obtained by independent means.) Discussion
Classification of existing electrocardiographic computer analysis programs into first and second generation programs points up the fundamental differences between these two approaches. In first generation programs, it is tacitly assumed that conventional methods of electrocardiographic interpretation are efficient and need few changes or improvements. The main goal in designing second generation programs is improvement of diagnosis mainly by applying multivariate classification techniques. The computer in these programs makes possible simultaneous use of large numbers of properly weighted electrocardiographic variables to enhance differentiation between diagnostic entities. Extensive tests on large record samples provided ample evidence that separation between two diagnostic entities at a time can indeed be improved.21-26 Further studies showed that similar improvements are feasible when many or all diagnostic possibilities are considered simultaneously.1g~20~26-28 The purpose of this study was to test a multigroup electrocardiographic computer program in the routine clinical setting of the medical service of a Veterans Administration hospital. Long distance telephone electrocardiographic data transmission was included in the test because data acquisition and computer facilities are frequently separated geographically. Of 1,663 records transmitted from the Veterans Administration Hospital, West Roxbury, Mass., to the Veterans Administration Hospital, Washington, D. C., only the 1,192 were used in which there was complete documentation of the clinical diagnosis. The latter was based exclusively on stringent nonelectrocardiographic criteria developed jointly by eight cardiologists within the framework of a Veterans Administration cooperative study.21-27 As pointed out in a recent editorial33 and in the in-
May 1975
The American Journal of CARDIOLOGY
Volume 35
603
ELECTROCARDIOGRAPHIC
COMPUTER PROGRAM-PIPBERGER
ET AL.
\
p
1 se= (
troduction to this report, most previous evaluations of electrocardiographic computer systems have not been completely satisfactory because, with few exceptions, tests were performed by human observers who used the same diagnostic criteria as those contained in the computer programs. When different criteria were used, disagreements reached almost 50 percent,34 leading to the crucial question: What is a correct electrocardiographic interpretation or diagnosis? To try to reconcile criteria and opinions of different interpreters in order to test electrocardiographic computer systems appears to us to have little promise or validity. Although rules for objective program evaluation are simple and straightforward, they have been disregarded even in recent studies. The study by Crevasse and Ariet35 may serve to illustrate the essential difference between evaluations based strictly on electrocardiographic criteria and those based on independent nonelectrocardiographic information. The accuracy of classification of left ventricular hypertrophy was found to be 98 percent by computer and 96 percent by human observers. The main criterion for left ventricular hypertrophy used by both the computer
664
May 1975
The American Journal of CARDIOLOGY
FIGURE 2. Record transmitted via telephone from West Roxbury, Mass., to Washington, D. C. A, a high level of noise is superimposed on the electrocardiographic tracing. In leads X and Y, which are of relatively low amplitude, determination of the beginning and end of P, QRS and T waves is practically impossible. B, same record after application of a strong digital filter with a cut-off frequency of 40 hertz. Although some noise is still present, the record could be processed in this form.
and human interpreters was the point system described by Estes. When this criterion was tested by Romhilt and Estes36 in 150 autopsy cases with left ventricular hypertrophy, the accuracy rate was only 60 percent. Assuming that the autopsy study indicated the “true” performance of the Estes criterion, it can be inferred from these data that the accuracy rate in the computer evaluation report was also at best close to 60 percent (98 percent of 60 percent or 59 percent). Similar considerations apply, of course, to other abnormalities and many other evaluation studies. The decision made in the Veterans Administration Cooperative Study on Cardiovascular Data Processing to use exclusively documented nonelectrocardiographic information for diagnostic patient classification was reached in order to avoid bias and the dilemmas of disagreement on criteria that are frequently based on personal preference or opinion. To our knowledge, the Veterans Administration study has been the only one so far to attempt to develop an extensive library of electrocardiograms with documentation of patients’ diagnoses derived exclusively from independent information. Diagnostic decision rules
Volume 35
ELECTROCARDIOGRAPHIC
FIGURE 3. Record similar to that shown in Figure 2. A, because of the larqe amount of noise superimposed during telephone tra>smissions, the P and T waves are hardly discernible. B, same record after application of digital filter. The P and T waves are easily identified.
B
were developed from this large data base. In this study, as in all previous studies from this laboratory, agreement with objective evidence for well defined clinical diagnostic entities was the sole yardstick for determining the accuracy of computer interpretations. In addition to comparisons between computer analysis and independent diagnostic evidence, 12 lead electrocardiographic interpretations were evaluated against the same evidence. The overall rate of agreement (68 percent) appeared higher than the average rate of agreement (54 percent) reported by Simonson et a1.17in a similar study. The difference can be explained at least in part by the following: In Simonson’s study the interpreters had no knowledge of the patients’ clinical status, whereas in our investigation most patients were known to the two electrocardiographic readers. Differences in patient population may also have played a role. The degree of specificity in 12 lead electrocardiographic interpretations speaks highly of the expertise of the two readers. Only 8 percent of the records from patients without evidence of cardiovascular disease were misclassified. The diagnostic recognition rates in patients with myocardial infarct and valvular heart disease were also high (79 and 68 percent, re-
1 *l?c
COMPUTER PROGRAM-PIPBERGER
ET AL.
I
spectively). As found in previous studies,23s25 12 lead electrocardiographic criteria were relatively inefficient in patients with hypertensive cardiovascular disease or chronic obstructive lung disease. Accuracy of computer interpretations: The overall accuracy rate of computer interpretations exceeded that achieved by conventional 12 lead readings by almost 20 percent; that is, in approximately one of five records, analysis could be improved through computer application. Most gratifying was the low number of complete misclassifications. In only 9 percent of the total was the automated method incorrect as compared with 28 percent for conventional 12 lead analysis. Improvement in electrocardiographic diagnosis was greatest in patients with hypertensive cardiovascular disease and chronic obstructive lung disease; 84 and 81 percent, respectively, of tracings in this group were classified correctly. This result compared favorably with the recognition rate of close to 30 percent by 12 lead analysis in both groups. The low recognition rate of 26 percent for hypertensive cardiovascular disease corresponds to the low rates found by McCaughan et a1.25 in a larger study on 939 hypertensive patients when a variety of high voltage criteria of the 12 lead electrocardiogram were tested. The
1975
The American Journal of CARDIOLOGY
Volume 35
605
ELECTROCARDIOGRAPHIC
COMPUTER PROGRAM-PIPBERGER
ET AL.
tracings of only 32 percent of their cases were found to be outside normal limits, and the rate was even lower, 21 percent, in a subsample of mild cases without Icomplications. The low rate of 26 percent in our study suggests that the majority of our patients had only a mild or moderate degree of the disease. In addition, all patients were being actively treated, resulting in most cases in a decrease of left ventricular work. This makes the high recognition rate by computer even more remarkable. The fact that the results of conventional 12 lead electrocardiographic classification procedures, based on simple decision-tree logic, were found consistently inferior to results obtained by multivariate techniques applied to orthogonal 3 lead records does not necessarily indicate a basic difference in information content of these two lead systems. A true comparison could be achieved only if the same classification methods were applied to the records obtained by the two systems. To our knowledge, such a test was performed only once by Watanabe and co-workers3’ on relatively small samples. In a study on the differentiation between records from patients with car pulmonale and anterior wall myocardial infarctions, they applied a multivariate analysis technique to both orthogonal and conventional 12 lead electrocardiograms. The performance score (sensitivity + specificity/2) was 89.5 percent for the former and 89.0 percent for the latter leads, thus essentially indicating equality in diagnostic information. Application of prior probabilities to compute posterior probabilities: A highly significant contribution to the efficiency of electrocardiographic computer diagnosis was derived from the application of Bayes’ theorem in which prior probabilities are used to compute posterior probabilities for each diagnostic entity under consideration.lg In a formalized statistical sense, this procedure may appear new or even strange to the physician. However, it represents a procedure that is quite familiar to every clinician because he uses this approach whenever he is faced with a diagnostic problem. A simple example may suffice to clarify the function of this method. When analyzing an electrocardiogram from an infant, myocardial infarction is a remote diagnostic possibility because of the extreme rarity of congenital abnormalities of the coronary arteries. The prior probability for the diagnosis of myocardial infarction can be set very low in this situation. However, the reverse holds true for an electrocardiogram from a middle-aged or elderly man. Here the prior probability for myocardial infarction needs to be set considerably higher even in the absence of cardiac symptoms. When chest pain is present, it should be raised even further. The various sets of prior probabilities used in this study (Table III) can serve as further illustration of this approach. They indicate only the estimated likelihood of having one of the electrocardiographic abnormalities listed. The effect of prior probabilities on accuracy of classification is best shown in Table VIII. When the
606
May 1975
The American Journal of CARDIOLOGY
likelihood of having an electrocardiographic abnormality was set equal for all possible abnormalities, correct classifications were achieved in only 66 percent of cases. This corresponds to a situation in which nothing is known about the patient. Even in this case, the accuracy rate was close to that achieved by the interpreters of the 12 lead electrocardiogram who had some prior knowledge about the patients. As mentioned earlier, failure to select the appropriate set of prior probabilities on the electrocardiographic request form may lead to poorer results. In this study the choice was always based on prior knowledge of the documented final clinical diagnosis. Obviously, results will be best in this situation whereas complete lack of prior knowledge of the patient’s condition could have decreased accuracy. In routine application of this approach, the latter situation has been found relatively seldom since the diagnostic categories are broadly defined. Furthermore, preliminary results from a study now in progress indicate that when final clinical diagnoses are not known but prior probabilities of Table III are used, the rate of correct classification remains between 85 and 90 percent. Physicians’ acceptance of computer results in the form of probabilities has been gratifying. This is probably because in all problem cases, various possible diagnoses are considered with a certain likelihood without assigning numerical terms to these likelihoods. In addition, in cases with left or right ventricular hypertrophy, the probability levels have been found a useful indicator for the degree of hypertrophy that may increase or decrease in time, as shown in follow-up tracings. Effect of excessive noise transmission: Excessive noise induced by telephone transmission can cause serious difficulties with electrocardiographic measurements and in some cases lead to rejection of records by the computer. Application of a filter appears advisable in all tracings with an RMS noise level above 0.02 mv. When a strong digital filter was applied to some extremely noisy records, they became processable and, in most cases, correct diagnostic classification could be achieved. Changes in electrocardiographic configuration, which are unavoidable when strong filters are used, seemed to have little effect on diagnostic performance of the computer program. This finding is probably due to a certain degree of redundancy contained in the program. In contrast to first generation programs where a slight decrease in amplitude in a few electrocardiographic measurements may lead from one diagnosis to another, the multivariate procedure depends far less on changes in a few variables since relatively large numbers of variables are being considered. Clinical implications
Clinical application of an electrocardiographic computer program based on multivariate analysis proved both efficient and practical. Using large numbers of electrocardiographic measurements together
Volume 36
ELECTROCARDIOGRAPHIC
with clinical information provided on electrocardiographic request forms, it was possible to achieve an overall diagnostic accuracy rate of 86 percent, which exceeded that obtained by conventional 12 lead electrocardiographic analysis by almost 20 percent. The significance of these findings goes beyond electrocardiography. The data suggest that computer applications in medicine can be extended with advantage beyond simulation of current medical routines when appropriate techniques are used. The capability of the computer to perform complex operations such as multivariate analysis has attracted many investigators since the early days of medical
COMPUTER PROGRAM-PIPBERGER
ET AL.
data processing. In general, however, progress has been slow because of lack of adequate data bases, which need to be large and of good quality. Since development of such a data base was probably simpler in electrocardiography than in many other areas, it should not be too surprising that the first large scale demonstration of the superiority of multivariate analysis over conventional decision-tree approaches could be demonstrated in this field. This demonstration should stimulate others to make similar attempts in other areas in order to improve decision making not only in diagnosis but also in treatment and general medical care.
APPENDIX Rules for
Tabulation of Computer Results* Computer Report
Clinical Diagnosis No cardiovascular Myocardial
Hypertensive disease
disease
infarct
cardiovascular
Valvular and/or heart disease
congenital
Chronic obstructive disease
Primary
myocardial
1. 1st choice Normal 1 Normal >_ 30 L. 2nd or 3rd choice
+ -
Correct Partially
1. 1st choice any Ml or sum of the probabilities7 for Ml 2. 2nd or 3rd choice probabilities for any Ml 2 30 or sum of probabilitiest for Ml 2 40
+
Correct
+
Partially
correct
1. 1st choice LVH or LVH-BVH 2. 2nd or 3rd choice LVH 2 30
-* Correct --+ Partially
correct
1. 1st choice LVH or LVH-BVH 3 1st choice RVH or RVH-BVH 3. 1st choice sum of the probabilities + LVH 4. 2nd or 3rd choice LVH > 30 5. 2nd or 3rd choice RVH > 30 6. 2nd choice sum of the probabilities + LVH 2 40
1
6.
lung
disease
for RVH
for RVH
1. 1st choice PE, RVH or RVH-BVH 2. 1st choice sum of the probabilities for PE + RVH 3. 2nd or 3rd choice PE or RVH > 30 4. 2nd choice sum of the probabilities for PE + RVH > 40 1. 1st choice LVH or LVH-BVH 2. 2nd or 3rd choice LVH 2 30
correct
7
Correct
? Partially
correct
(
1
Correct
Partially
correct
Correct Partially
correct
> + +
* Probabilities of diagnostic electrocardiographic classifications add up to 100 in each case. First choice indicates category with highest probability, second choice with second highest probability, etc. Probabilities of 230 or sum of probabilities of 240 were used. Only probabilities of 220 were included in the sum. t Sum of probabilities of AMI, PDMI and/or LMI. AMI = anterior myocardial infarct; BVH = biventricular hypertrophy; LMI = lateral myocardial infarct; LVH = left ventricular hypertrophy; Ml = myocardial infarct; PDM I = posterodiaphragmatic myocardial infarct; PE = pulmonary emphysema; RVH = right ventricuular hypertrophy.
References Taback L, Marden E, Mason HL, et al: Digital recording of electrocardiographic data for analysis by a digital computer. IRE Trans Med Electra ME 6:167-171, 1959 Pipberger HV, Freis ED, Taback L, et al: Preparation of electrocardiographic data for analysis by digital electronic computer. Circulation 21:413-416 1960 Pipberger HV, Arms RJ, Stallmann FW: Automatic screening of normal and abnormal electrocardiograms by means of a digital
electronic computer. Proc Sot Exp Biol Med 106: 130-132, 1961 4. Cady LD Jr, Woodbury MA, Tick LJ, et al: A method for electrocardiogram wave-pattern estimation. Example: left ventricular hypertrophy. Circ Res 9: 1078- 1082, 196 1 5. Caceres CA: Electrocardiographic analysis by a computer system. Arch Intern Med 11:196-202, 1963 6. Smith RE, Hyde CM: Computer analysis of the electrocardio-
May 1975
The American Journal of CARDIOLOGY
Volume 35
607
ELECTROCARDIOGRAPHIC
COMPUTER PROGRAM-PIPBERGER
ET AL.
gram in clinical practice. In, Electrical Activity of the Heart (Manning GW, Ahuja SP, ed). Springfield, Ill, Charles C Thomas, 1969, p 305-315 7. Pordy L, Jaffe H, Chesky K, et al: Computer diagnosis of electrocardiograms. IV. A computer program for contour analysis with clinical results of rhythm and contour interpretation. Comput Biomed Res 1:408-433, 1968 8. Arvedson 0: Methods for Data Acquisition and Evaluation of Electrocardiograms and Vectorcardiograms with the Digital Computer. Umea, Sweden, Department of Clinical Physiology and Computer Center of the University of Umea, 1968 9. Pryor TA, Russell R, Budkin A, et al: Electrocardiographic interpretation by computer. Comput Biomed Res 2:537-548, 1969 10. Klmura E, Mibukura Y, Miura S: Statistical diagnosis of electrocardiogram by theorem of Bayes: a preliminary report. Jap Heart J 4:469-488, 1963 11. Blomqvist G: The Frank lead exercise electrocardiogram. A quantitative study based on averaging technic and digital computer analysis. Acta Med Stand 178: Suppl 440: l-98, 1965 12. Pipberger HV, Dunn RA, Cornfield J: First and second generation computer programs for diagnostic ECG and VCG classification. In, Proceedings of the Satellite Symposium of the XXVth International Congress of Physiological Sciences on the Electrical Field of the Heart and of the Xllth International Colloquium Vectorcardiographicum (Rijlant P, Ruttkay-Nedecky I. Schubert E, ed). Brussels, Presses Academiques Europeennes, 1972. p 43 l-439 13. Bonner RE, Crevasse L, Ferrer MI, et al: A new computer program for analysis of scalar electrocardiograms. Comput Biomed Res 5629-653, 1972 14. Dreifus LS, Watanabe Y, Reich M, et al: Total systems approach for computer electrocardiography (abstr). Am J Cardiol 31:129, 1973 15. Neufeld HN, Sive PH, Rlss E, et al: The use of a computerized ECG interpretation system in an epidemiologic study. Methods Inf Med 10:85-90, 1971 16. Prinzmetal M, Goldman A, Massumi RA, et al: Clinical implication of errors in electrocardiographic interpretation. Heart disease of electrocardiographic origin. JAMA 16 1: 138- 143, 1956 17. Simonson E, Tuna N, Okamoto N, et al: Diagnostic accuracy of the vectorcardiogram and electrocardiogram. A cooperative study. Am J Cardiol 17:829-878, 1966 18. Cornfield J: Statistical classification methods. In, Computer Diagnosis and Diagnostic Methods (Jacquez JA, ed). Springfield, Ill, Charles C Thomas, 1972, p 108-130 19. Cornfield J, Dunn RA, Batchlor CD, et al: Multigroup diagnosis of electrocardiograms. Comput Biomed Res 6:97-120, 1973 20. Plpberger HV, Cornfield J, Dunn RA: Diagnosis of the electrocardiogram. In Ref 18, p 355-373 21. Gamboa R, Klingeman JD, Pipberger HV: Computer diagnosis of biventricular hypertrophy from the orthogonal electrocardio-
606
May 1975
The American Journal of CARDIOLOGY
gram. Circulation 39:72-82, 1969 22. Goldman MJ, Pipberger HV: Analysis of the orthogonal electrocardiogram and vectorcardiogram in ventricular conduction defects with and without myocardial infarction. Circulation 39: 243-250. 1969 23 Kerr A Jr, Adicoff A, Klingeman JD, et al: Computer analysis of the orthogonal electrocardiogram in pulmonary emphysema. Am J Cardiol 25:34-45, 1970 24. Eddleman EE Jr, Pipberger HV: Computer analysis of the orthogonal electrocardiogram and vectorcardiogram in 1,002 patients with myocardial infarction. Am Heart J 81:608-621, 1971 25. McCaughan D, Littmann D, Pipberger HV: Computer analysis of the orthogonal electrocardiogram and vectorcardiogram in 939 cases with hypertensive cardiovascular disease. Am Heart J 851467-482, 1973 26. lshikawa K, Kini PM, Plpberger HV: P wave analysis on 2464 orthogonal electrocardiograms from normal subjects and patients with atrial overload. Circulation 48:565-574, 1973 27. Kini PM, Wiliems JL, Batchlor C, et al: ST-T changes induced by digitalis and ventricular hypertrophy: Differentiation by quantitative analysis. J Electrocardiol 5:101-l 10, 1972 28. Dunn RA, Pipberger HV, Cornfield J: The U.S. Veterans Administration ECG analysis program. In, Computer Application on ECG and VCG Analysis (Zywietz C, Schneider B, ed). Amsterdam, North-Holland Publishing, 1973. p 142-153 29. Pipberger HV: Computer analysis of the electrocardiogram. In, Computers in Biomedical Research, Vol 1 (Stacy RW, Waxman BD, ed). New York, Academic Press, 1965. p 377-407 30. Berson AS; Telephone transmission of electrocardiograms. In Ref 28, p 83-97 31. Littmann D: Textbook of Electrocardiography. New York, Harper & Row, 1972 32. Draper HW, Peffer CJ, Stallmann FW, et al: The corrected orthogonal electrocardiogram and vectorcardiogram in 5 10 normal men (Frank lead system). Circulation 30:853-864. 1964 33. Pipberger HV, Cornfield J: What ECG computer program to choose for clinical application. The need for consumer protection. Circulation 47:918-920. 1973 34. Caceres CA, Hochberg HM: Performance of the computer and physician in the analysis of the electrocardiogram. Am Heart J 79:439-443, 1970 35. Crevasse L, Ariet M: A new scalar electrocardiographic computer program. Clinical evaluation. JAMA 226:1089-1093, 1973 36. Romhllt DW, Estes EH Jr: A point-score system for the ECG diagnosis of left ventricular hypertrophy. Am Heart J 75:752-758, 1968 37. Watanabe Y, Nishijima K, Richman H, et al: Vectorcardiographic and electrocardiographic differentiation between car pulmonale and anterior wall myocardial infarction. Am Heart J 84:302-309, 1972
Vdume 35