ExpertSv~tem~ WtthApphcattom. Vol 6, pp 433-440, 1993
0 9 5 7 4 1 7 4 / 9 3 $ 6 0 0 + 00 © 1993 Pergamon Press Ltd
Printed m the USA
The Medwise Diagnostic Module as a Consultant P. L. M. KERKHOF, M. P. VAN DIEIJEN-VISSER, R. KOENEN, J. J. A. SCHREUDER, H . G . DE BRUIN AND K . G I L L MedwlseWorkingGroup, Maarssen,The Netherlands
Abstract--Medwlse ts an mteracttve computer assisted medwal diagnosis program runnmg on a personal computer The knowledge base (KB) o f our Medwise system ts founded on a matrix structure representatton o f disease profiles Thts study evaluates some umque features of Medwlse, namely the matrtx structure, the automattc asstgnment of wetght factors during expansion o f the KB, the atocdiary KB with equtvalent terms, and the mvarlance during expanston o f the KB For 104 pattents described m 1986 and 1987, the dtagnostw outcome was compared to the conclustons o f the chnicians Wtth the use o f the matrtx and all other opttons active, Medwlse yields the correct diagnosts tn 93% o f the cases The performance decreases to 79% if the charactertsttcs of the matrtx are ehmmated. After the KB was expanded from 2,400 to 3,600 dlsease profiles, the correct dlagnosls was estabhshed m 91% whtle analyzing the same cases This tmphes that tt may be anttclpated that the score o f the dtagnosttc program will not apprectably decrease as the KB expands towards the final goal o f 8,500 dtsease profiles Furthermore, the lack o f standardized medical language hmtts the potenttal use o f computers m medtcme Incorporation o f a separate knowledge base containmg equtvalent expressions proved to contribute to a better score for the Medwise program
fer, 1987; Kerkhof, Koops, van Dyke, De Sera, & Camphuisen, 1983; Miller, Masarie, & Myers, 1986; Miller, Pople, & Myers, 1982), with a few commercially available applications (Miller et al., 1986; Warner et al., 1988; Waxman & Worley, 1990). In my literature survey (Kerkhof, 1987) 16 programs developed worldwide were characterized. Their KB's ranged from 20 diseases (for nephrology) to 5,820 disorders (for veterinary medicine). Clearly, some of them aim at a single specialty, whereas others focus on the complete spectrum of internal medicine. While the majonty of these earlier projects still exist, it is remarkable that since 1987 only two new major programs were introduced, namely Meditel (Waxman & Worley, 1990) and Iliad (Warner et al, 1988). Over the years, most programs were modified so as to run on personal computers (PCs) rather than on main frames; occasionally such transition was accompanied by a change of name (e.g., the PC-version of Internist-I (Miller et al., 1982) appeared as QMR (Miller et al., 1986). Some projects address the complete scope of internal medicine (Barnett et al., 1987; Kerkhofet al., 1983; Miller et al., 1986; Miller et al., 1982; Warner et al., 1988; Waxman & Worley, 1990). The vaster their KB, the more such programs are confronted with confusion regarding terminology. Problems of linguistics, language translation, and text interpretation have indeed been a central issue in artificial intelligence This paper addresses several problems inherent to medical terminology which have
1. INTRODUCTION KNOWLEDGE-BASE systems are used for interpretation
of data about a speofic problem, in the light of knowledge represented in the knowledge base (KB), to develop a problem-specific model and then to construct plans for problem solution. The KB contains the descriptive or factual knowledge pertaining to the domain of interest. Also, there is a mechanism by which problem data probes the KB in order to derive candidate hypotheses therefrom through some pattern matching system. Then there is a reasoning "engine" (also termed inference machine) that carries out the manipulation specified to reach a decision (Williams, 1982). Currently, electronic storage and processing of medical information is often routinely applied. Relatively new is the application to computer-assisted diagnosis. It has been argued that significant advances within this field require a better understanding of how physicians reason (Ledley & Lusted, 1991). Although the potential of diagnostic programs is still limited (Kerkhof, 1987), the area appears fascinating and promising. Various diagnostic systems have been developed and evaluated (Barnett, Cimino, Hupp, & Hof-
Requestsfor repnnts shouldbe sentto P.L M. Kerkhof,Coordinator MedwlseWorkingGroup, P O. Box 1621, 3600 BP Maarssen, The Netherlands 433
434
P.L M. Kerkhof et al.
direct effects on the communication trajectory from patient to computer system, and in particular on the utility of KB systems in medicine. An estimated 1,000,000 facts and associations constitute the body of internal medicine (Pauker et al., 1976). Basic information on over 3,200 disease entities has been stored in CMIT (Finkel, 1981). The usefulness of these data depends on their accessability and proper interpretation when applied to a specific patient case. Computer applications can be helpful to disseminate and process such knowledge. In this study we report an evaluation of our Medwise system, with particular attention to some specific features of our approach. We investigated in detail the following aspects: I. the importance of applying a matrix structure, 2. the impact of inclusion of negative findings, 3. the use of an auxiliary KB containing equivalent expressions, 4. the diagnostic sensitivity of the Medwise diagnostic program, in dependence of automatically assigned weight factors in relation to the extent of the underlying KB.
2. CHARACTERISTICS OF M E D W I S E Since 1981 our Medwise Working Group has been involved in the development of a medical KB, which permits the formulation of a differential diagnosis (DD) for any given patient on the basis of clinical findings and laboratory data (Kerkhof et al., 1983). Medwtse basically consists of two components: a medical KB and a menu-driven application program that intelligently (i.e., by association, combination, assignment of individual weight factors to input terms, and application of specific exclusion rules) matches patient input data with information on all stored disease profiles. The KB covers all medical specialties and now contains some 3,600 disease profiles, coded on the basis of ICD-9-CM (ICD, 1978), and in addition 100 newly described disorders (e.g., Kearns-Sayre syndrome = oculocraniosomatic neuromuscular disease with mitochondrial myopathy) not yet included in this ICDsystem. The subdivisions of ICD-9-CM also determine the depth of the level of the generated diagnoses. It can be estimated that the Medwise KB now covers about 40% of the total spectrum of disease profiles. The process of extracting domain knowledge from human experts is called knowledge engineering. In the case of the Medwise KB we preferred to compile such information from major textbooks and scientific journals. This route was followed for two reasons: (a) material published in established books and journals is supposed to reflect consensus in each individual area;
CMIT is a prototype example of such a survey of medical information (Finkel, 1981); (b) since Medwise covers the complete spectrum of the medical sciences, it would have been an immense task, if not a life-time project, to find sufficient human experts and reach consensus in each area. This point of view is justified by appreciating that at some level the detailed knowledge obtainable from human experts may have little relevance for the average clinician who is not aware of boundary conditions regarding new developments and possible limitations inherent to novel techmques. The basic question here is: how many details do nonspecialists actually need? The issue has been extensively documented in an editorial elsewhere (Kerkhof, 1987). Information on diseases is stored in l0 × 20 matrices, while each ICD-code corresponds with one such matrix. Thus, the ten columns of the matrix refer to the standard decimal subdivisions of the ICD-code which typically represent existing variants of a main disease group. The rows of the matrix correspond with 20 itemized fields such as physical examination, histology, or blood chemistry. Consequently, any data in a single matrix element is characterized by two fundamental properties: (a) reference to a subtype of a disease; (b) pertinent information concerning l of 20 predefined fields. Currently, there are 27,000 search terms (descriptors) in the Medwise system. An auxiliary KB permits selection of equivalent expressions when entering patient data (Kerkhof et al., 1990). This separate KB includes almost 300 expressions which, each on average, refer to three equivalent terms (e.g., the triad extremity, leg, and limb). Equivalents are automatically generated to assist the user during the process of data entry.
3. DATA ENTRY FOR CASE ANALYSIS Interactively entered search terms are derived from an individual case history, and may include a set as given by the following example: • age 2 years (field 10: epidemiology) • female/girl (field 10: epidemiology) • - recent trauma/injury (field 4: etiology) • vomiting history (field 3: recent events in history) • high fever (field 2: symptoms and signs) • convulsion/seizure (field 2: symptoms and signs) • pharyngitis (field 2: symptoms and signs) • palpebra/eyelid edema (field 2: symptoms and signs) diarrhea (field 2: symptoms and signs) • malaise (field 2: symptoms and signs) • - leukocytosis/wbc increased (field 7: blood chemistry) • - casts in urine (field 8: urinalysis etc.). Equivalent terms (separated by the slash) are automatically generated by the program and contained •
-
Medwtse Medtcal Dtagnosts
435
within the auxiliary KB. The minus sign preceding a term means that such finding was absent. The fields mentioned are selected by the user. Weight factors are automatically assigned to each input term, while the program adequately incorporates the higher importance of laboratory findings relative to subjective complaints. The sequence of entering terms does not influence the outcome, apart from terms purposely assigned within field 3 concerning "history and natural course." As such the program exhibits no complete time-dependent trajectory; the sequence of fever and subsequent rash, cannot be distinguished from a case beginning with rash and followed by fever, for example. Considering the fact that an average case contains more than 70 input terms, this does not seem to be a major obstacle in our approach. Again, we felt that other problems (e.g., regarding terminology) are more crucial and should be solved first. Full details on the logics of the program are provided in the next paragraph. The final output of the exercise of entering a full set of patient data is a DD-hst (Table 1), where alternatives with a score higher than 50 points usually are clinically relevant. As a next step, each candidate diagnosis is documented by three surveys: 1. Terms explained (in the example, first candidate in Table 1: age 2 years; high fever; convulsion/seizure; pharyngitis; palpebra/eyelid edema, malaise). 2. Terms posstbly not explained (in the example: female/girl; - r e c e n t trauma, vomiting history; -diarrhea; -casts in urine). Nonexplained negative findings imply that they do not violate the hypothesis. 3. Terms that contradtct the candidate diagnosis selected from the DD-list (in the example: -leukocytosis/wbc increased).
4. C A L C U L A T I O N O F T H E SCORE FOR M A T C H I N G The mathematical formalism employed considers the relative importance of each term entered by calculating two factors:
1. A(n, f ) representing the frequency of the nth term in the entire KB and within the selected field ( f ) ; it reflects an inverse measure of specificity. 2. W(/) being the intrinsic weight factor for the field selected; for example, the finding of "thrombocytopenia" (field 7: blood chemistry) is more informative than the subjective complaint "easy bruising" (field 2: symptoms & signs). Let Sn(f) be a particular term entered, then obviously each term Sn(f) with running number n is specified either as being present (then d(n) = 1) or as being absent (then d(n) = -1), in addition to the selection of the appropriate f i e l d f The latter procedure is comparable to a type of "filter setting" to enhance interpretation of information. Mx(i, j ) is a matrix element referring to ICD-code number x, where 0 < x < 10000, with column t and r o w j Any combination o f x and t represents a particular column and thus a single disease profile. The index j ranges from 1 to 20, and corresponds with a selected field as defined above. Matching between individual patient data and standard disease profiles as stored in the KB is expressed by calculation of: score (x, i) = ~ K , WO)/A(n, j), nd
where the Kronecker delta (K) is defined by the following conditions:
K=d(n)
if
Sn(f)
Mx(l,j)
in
K= 1
if
Sn(f)
not
in
Mx(z,j)
for
d=-I
K=0
if
Sn(f)
not
in
Mx(i,j)
for
d = 1.
Note that A(n, f ) is automatically adjusted as the size of the KB increases. Each score-value is normalized to the maximum of 100 points. In terms of hardware the programs require a personal computer running MSDos, with 640 kB internal memory and occupy 10 MB harddisk memory space. A 286 or enhanced processor, color screen, and printer are recommended.
TABLE 1 Differential Diagnosis List for the Example Case in the Text
#
Score
ICD-Code
Disease Name Present Case:/temp/nov87.1gf
1 2 3 4 5 6 7 8
100 76 61 61 61 53 53 53
057.8 049.0 013.0 021 8 074.1 036 0 047.0 047 1
exanthema subttum-sixth dtsease-Zahorsky's dts lymphocytic chonomentngttts tuberculous meningitis oropharyngeal tularemm Bornholm's disease menmgococcal meningitis menmgit0s by Coxsackte virus menmgttis by Echo virus
436
P L M KerkhoJ et al
5. I M P L I C A T I O N S OF THE MATRIX DESIGN It ~s important to emphasize that M e d w l s e features a unique characteristic which significantly contributes to its diagnostic power: we employ a special technique to assist the interpretation of medical information, namely the previously mentioned matrix representahon of medical knowledge. Our application programs have been designed to facilitate future additions to its KB (Kerkhof et al., 1990). Various characteristics of M e d w i s e illustrate the feasibihty to continuously up-date and expand the KB: • A generalized formalism to assign relative weight factors to individual input data is applied (Kerkhof et al., 1990). Thus, there is no need to reconsider previous woght factors as the KB expands. • Information stored in matnx elements may be linked to each other. This creative property is obviously enhanced as the actual KB expands over the coming years without the need to explicitly formulate new links. For example, since 'pernioous anemm' as such (Morbus Biermer) is covered by an ICD-code, all information mentioned under that heading Is available to explain features of other diseases that exhibit this type of anemia (e.g., gastric carcinoma or the hypokalemia syndrome). However, by mtmtion it may be envisioned that any substantial expansion of the KB may theoretically result in less precise answers generated by the diagnostic module, simply because the number of candidates to choose from has increased since the time of the preceding evaluation. To evaluate this possibility, we decided to perform a study by companng the diagnostic outcome at two stages of the development of the KB. To assure identical boundary conditions for this comparative analysis, we employed the same patient data sets as inputs. 6. PROBLEM AREAS OF MEDICAL TERMINOLOGY
Medical language forms one of the greatest obstacles for the practical use of any type of KB designed for application m the field of medicine (Schiffman, 1989). Natural language often has remote roots (e.g., palpebra and blephar both refer to the eyelid). Adrenaline and epinephrine are the same chemical substances. Bicarbonate is identical to carbomc acid. Abdominal typhus is the same as typhoid, typhoid fever (to be distinguished from typhus fever, however), and typhogastric fever. Similarly, (pontme) angle tumor, acoustic (nerve) neurinoma, and acoustic neurilemmoma all mean the same. These examples are unequivocal (Kerkhof, 1992a). And what is in a phrase? How to interpret the following sentence: "The CT scan showed accentuation of the peripheral margins of the bilateral parieto-oc-
cipital forceps major, and splemum low-absorptive abnormalities." (What does it mean, anyway?) (Case #45, 1988). This example illustrates the problem of translating medical phrases into concise "computer-stored language." Besides such c o m m o n problems inherent to the understanding of natural language, additional difficulties pertaining to medical terminology are manifest: • A m e m c a n versus Brittsh spelhng Two standard differences are evident, namely the use of the diphtongue m British spelling (e.g., anaemia versus anemia, and humour versus humor), and preference for using c (e.g., m leucocyte) rather than k (as in the American spelling leukocyte). • Preferred termmology In radiology, "air" means: gas within the body, regardless of its composition or site. However, the term should be reserved for respired atmospheric gas. With reference to pneumothorax, subcutaneous emphysema, or the contents of the gastrointestinal tract the preferred term is "gas." On other occasions the preferred terminology pertains to technical vocabulary which permits high precision and resolution descriptions if the available information is extremely exact. In these circumstances a valuable tool is blunted, if carelessness creeps in. For example the word "clumsiness" describes defective coordination of movement, whereas the term "dysdladochokinesls" refers to the well-defined phenomenon of a defect in the ability to perform rapid movements of both hands in umson (Murphy, 1976). • Whlch expressions are different and what do they exactly m e a n 9 A straightforward example is: tym-
panism, tympamtes, tympanitis, and tympany, particularly in relation to: tympanal, tympanic, tympanous, and tympamtic (Kerkhof, 1992a). We also will present one annotated example: heterogenous means: not onginatlng in the body; heterogeneous means: not of uniform composition, quality, or structure; heterogenetic means: pertaining to asexual generation, heterogemc, finally, has the same meaning as heterogeneous. • Jargon wtthln a subspeclalty The terms "show, engagement, lightening, and station," for example, have a particular meaning w~thin the field of obstetrics (Kerkhof, 1992b). The expression "'streaking" has a meaning which differs for the microbiologist and the radiologist • I m p h c l t information If urinalysis is found to be normal, then such examination imphes (at least) the absence of the quadruplet proteinuria, hematuria, leukocytuna, and casts. • Mirror-terms may apply: "dry cough" imphes "no productive cough" (which can be entered as a negative finding). Likewise, leukopenia in particular implies "no leukocytosis." This mutual exclusion apphes to all antonymes, notably to the almost trivial
Medwtse Medical Diagnosts
examples of all terms with hyper- that 'by definition' form mirror images with similar terms beginning with hypo-. • I m p r e c t s e terminology (Yu, 1983). Some terms may
•
•
•
•
•
carry a vague meaning (e.g., tumor, swelling, mass, and lump). To a large extent, however, the use of such terms reflects the uncertainty around an observation. In that respect it refers to a justifiable "law o f preservation o f uncertainty." To put it the other way around: It would be incorrect to specify an observation in greater detail than the facts permit. L i m i t e d scope o f a thesaurus. No agreement exists regarding a directory for coding diseases. Major sources are organized in different ways (e.g., in ICD9-CM (ICD, 1978) one finds "Bladder, see condition (e.g., Leukoplakia)", whereas CMIT (Finkel, 1981) reads: "leukoplakia (of bladder), see bladder"). Notably, "leukoplakia of the bladder" as such is not listed in MeSH (1987). Synonymes For some reason "icterus" is identical to "jaundice"; it is not all that difficult, but the major problem is that you have to recall this every time again whenever you use either of them. Leukopenia means the same as leukocytopenia. Thrombocytosis and thrombocythemia are other words used to indicate that the number of platelets in the peripheral circulation is in excess of 350,000 per microliter. Eponymes Many disease names refer to the first author (e.g., Boeck's disease for sarcoidosis) who described the particular disorder, to the first patient analyzed in detail (e.g., Mortimer's disease, again for sarcoidosis), or to the area (e.g., Lyme disease) where the illness was first detected. In other cases the logicality is less obvious: an epidemiologist ever attempted to replace the term Bornholm disease just to honor a friend (Sylvest), and Behcet's disease was previously clearly described by Bluthe. Geographical variations also occur: The Plummer-Vinson syndrome (referring to sideropenic dysphagia) as it is known in the USA and Australia, is termed Patterson-Kelly syndrome in the UK, but WaldenstromKjellberg syndrome in Scandinavia (Firkin & Whitworth, 1987). The frequency o f occurrence The meaning of Indicators like always, often, etc., is not transparent, and the intuitive interpretation of such quasi-numerical determinants is summarized in Table 2 (Kong, Barnett, Mosteller, & Youtz, 1986). Noise terms m the description o f a patient (Kerkhof et al., 1990). When analyzing 104 case histories, we found that the input consisted on average of 75 terms; required for establishing the primary diagnosis were only 15 terms. This implies that 80% of the input data consisted of "noise terms," which do not directly contribute to the confirmation of the correct diagnosis. Rather, such overwhelming portion of infor-
437
mation may blur the process of hypothesis formation, and in particular when the diagnostic task is solely left to the human being who is prone to confusion by this type of "overdosis." 7. H O W TO SOLVE THE LINGUISTIC DISCREPANCIES? Various scientists have described techniques to help and solve language problems in the field of medicine. An on-line available medical dictionary can be an important tool for research and application m natural language processing. Recently, Dorland's Illustrated Medical Dictionary has been converted to an on-line interactive computer-based version (McCray & Srinivasan, 1990). To assure exact description of definitions, a coding system has been advocated because of: ease of handling, redundancy to avoid errors, specificity, indication of relationships (e.g., by using common code groups for
TABLE 2 Translation to Percentage Values for Indicators of Frequency and Probability (Kong et al., 1986)
Always Invariably Certain Pathognomonic Classic Almost certain Almost always Very likely Normally Odds on Expected Usually Commonly Probable Consistent with Likely Common Often Compatible with Frequent Moderate risk Sometimes Not certain Not unreasonable Possible Occasionally Period=cally Doubtful Low probability Unhkely Improbable On occasion Infrequently Rarely Exceptionally Never Almost never
99 99 95 94 91 90 89 85 81 74 73 71 69 65 65 63 61 59 57 56 54 33 33 32 27 21 20 20 17 14 13 12 12 5 5 3 2
438
P L M Kerkhofetal
similar terms), equating equivalent terms and linking them in different languages, and the feasibility to crosslink terms on the basis of common portions of their codes (Bxshop & Dombrowski, 1990). Attempts have also been made to automatically translate existing medical terminology systems (Cimino & Barnett, 1990). For example, using a semantic network for mapping, the closest match between the MeSH term "portography" was "portal contrast phlebogram" in the ICD-9 procedures directory. In 1986 the National Library of Medicine (Bethesda, MD) started a project called UMLS (Unifield Medical Language System). The project aims to address the fundamental information access problem caused by the variety of independently constructed vocabularies and classifications used in different sources of machinereadable biomedical reformation. The UMLS approach will be to compensate for differences in the terminologies or coding schemes used in different systems, as well as for differences in the language employed by system users, rather than to impose a single standard vocabulary on the biomedical community (Humphreys, 1989).
8. PARAMETERS ANALYZED
IN THIS STUDY The practical value of computer-assisted diagnosis programs may be evaluated by analyzing actual patient eases. Similar to other investigators (Miller et al., 1982) we analyzed clinicopathological exercises as weekly published m The New England Journal of Medtcme We ran 104 consecutive cases described in 1986 and 1987 according to the following protocols:
type /--applying all available options (cf. formula above); type 2--simulating a run without matrix features (i.e., f = j for any value o f f a n d j , W ( f ) = 1 for any f, A(n,f) = A(n, 1) + . . . A(n, 20)); type 3--without inclusion of negative findings (i.e., Sn(f) is omitted if d(n) = - 1); type 4--without the use ofeqmvalent expressions (i.e., neglecting the auxiliary KB); type 5--expansmn of the KB by 50%. All data entered for each patient in protocol type 1, were subsequently stored in a log-file. Next, protocols 2 through 5 were run as batch-files after appropriate modification of the software. The various outcomes are compared with the final patho-anatomical diagnosis reported in the journal.
9. ANALYSIS OF PATIENT CASE STUDIES We analyzed more than 100 consecutive climcopathological exercises. The distribution of the primary diagnoses is given in Table 3. Results of this evaluation have been published previously (Kerkhof et al., 1990). During the first evaluation the size of the KB was 2,400 diseases, while the score was 93%. The performance was again evaluated for the KB after expansion to 3,600 diseases (protocol 5). The same 104 log-files were entered, and the program yielded the same primary diagnoses, except for two cases. This implies that the performance is minimally affected by the increased size of the KB. We also analyzed the relative ranking of alternatives in the DD-lists. In 27 cases out of the 104 patients studied, the difference of the score between the first
TABLE 3 Distribution of the 104 Diagnoses Over Clustered Disease Groups
Dtsease Group
Number of Diagnoses
ICD-9-CM
Infecttous diseases Neoplasms Endocrine or metabolic Blood (forming organs) Psychiatric disorders Nervous system Cardiovascular diseases Respiratory tract Dtgestive tract Urogenital tract Gynecology Dermatology Locomotion Congen,tal dtsorders latrogentc or intoxication Total:
18 (21) 23 7 (10) 4 1 5 20 (21) 6 (13) 5 (7) 4 (6) 4 0 (1) 5 2 0 (1)+ 104
001-139 140-239 240-279 280-289 290-319 320-389 390-459 460-519 520-579 580-629 630-676 680-709 710-739 740-779 800-999
Numbers w,thm parentheses refer to double counts due to possible overlap of these categones
Medwtse Medical Diagnosis
439
and the second candidate increased, in 16 cases it decreased, while in the other cases it remained unaltered. Changes that occurred regarding the separation between the first two candidates are related to the automatic assignment of weight factors to the individual search terms; as mentioned before, this process indeed automatically continues when the KB is expanded. We conclude that it is very unlikely that the performance of the diagnostic program will appreciably decrease, as the KB will be further enlarged towards its final goal of including some 8,500 disease profiles. 10. RESULTS The standard Medwise diagnosis (protocol type 1) was established using on average 74.9 input terms (SD = 16.7), of which 44.67 (SD -- 12.32) referred to positive findings. This implies that 40% of the total information embodied in every case consisted of negative findings. The latter were all neglected in protocol type 3. The number of explained terms was on average 14.9 (SD = 4.75), while on average 1.73 terms violated the primary diagnosis. Evaluation of a single case takes on average two to four minutes once all data are entered, depending on the hardware specifications. The clinicians disagreed with the patho-anatomical diagnosis in 15 of the 104 cases. In 9 of these 15 cases the Medwise diagnostic program came up with the correct answer while running protocol type 1. The remaining six cases were also missed by Medwise, in addition to one other diagnosis which the program failed to identify correctly. After addition of specific pathoanatomical findings (regarding histology, etc.), the program yielded the correct diagnoses in all but one case. As expected, the performance of the Medwise diagnostic program decreased if certain features were not activated (protocol types 2, 3, and 4). The results are summarized in Table 4. Running 104 cases with use of the auxiliary KB yields a correct diagnosis in 93%. Disregarding the equivalent expressions, however, lowers the score to 82%, implying an absolute decrease of 11%. As expected, inclusion of the auxiliary KB also enhances user-friendliness of the system, and accelerates the process of data entry (Kerkhof et al., 1990).
I I . DISCUSSION This study documents that computer supported diagnosis of complete and carefully described patient cases yields satisfactory results. Inclusion of negative findings is mandatory, since the performance decreases substantially if confirmed observations regarding the absence of certain findings are omitted. For a typical case from the journal analyzed, it is not uncommon that more than one third of all search terms entered pertain negative findings. The matrix structure permits selective assignment of weight factors W(]) depending on the field that applies for a given input term. Therefore, it is not surprising that practical rules of thumb derived from clinical experience can be translated into different and independent weight factors for all fields employed in Medwtse One advantage of this strategy is illustrated by the enhanced performance with the use of the matrix (protocol type l versus type 2). Another benefit concerns the more precise identification of the context within which a word is used (e.g., diabetes as disease name (field l), diabetes as a causative factor (field 4), or diabetes as a complication (field 6)). Inclusion of an auxiliary KB with equivalent expressions contributes, although to a lesser degree, to the overall performance. Other factors (as scrutinized in protocol types 2 and 4) primarily depend on the design of the KB, and in particular on the matrix structure representation of disease profiles. To our knowledge, Medwise is the only computer assisted diagnosis program which exhibits the latter feature. An important remark addresses the usefulness of computer supported diagnosis. Obviously, the computer can never replace the primary functions of a physician; observation, communication, and guidance of the patient are elementary activities to be carried out by the human being. However, integration of numerous findings (in our evaluation on average 75 items per patient case!) is a task for which computers are ideally equipped. In fact, the score obtained by the program under the present boundary conditions is better than the results obtained by the clinicians. This implies that Medwlse and comparable programs may very well serve as a consultant to formulate a second opinion.
TABLE 4 Summary of Performance Evaluation of 104 Actual Patient Cases
Type of Analysis
Score
Cond=tions of Study
Clinicians (in Journal) Medwise protocol type 1 Medwise protocol type 2 Medwise protocol type 3 Medwise protocol type 4 Medwise protocol type 5
86% 93% 79%
any answer accepted all options active matrix structure absent negative terms excluded equivalents neglected expansion of KB by 50%
74%
82% 91%
440
P L M Kerkhof et al
It has b e e n a n t i c i p a t e d t h a t in t h e f u t u r e c o m p u t e r s m a y replace t h e t r a d i t i o n a l role o f t h e p h y s i c i a n q u a d i a g n o s t i c i a n ( M a z o u e , 1990). E c o n o m i c a l as well as ethical r e a s o n s w e r e p r e s e n t e d to s u p p o r t t h e n e w paradigm. However, such movement would imply a t r a i n i n g o f c l i n i c i a n s w i t h m o r e e m p h a s i s o n t h e art to o b s e r v e t h e p a t i e n t .
REFERENCES Barnett, G O, Clmmo, J J , Hupp, J.A, & Hoffer, E P (1987). DXplaln, an evolwng diagnostic deosion-support system. Journal of the Amerwan Medwal Association, 258, 67-74 Bishop, C W, & Dombrowskl, T (1990). Coding, why and how M D Computing, 7, 210-215 Case #45-1988 (1988) New England Journal of Medwme, 319, 12681270. Clmlno, J.J., & Barnett, G O (1990) Automated translaUon between medical terminologies using semantic definmons. M D Computing, 7, 104-109. Fmkel, A.J. (Ed.) (1981) Current medical information and terminology (Sth ed ). Chicago American Medical Assocmhon Flrkm, B G , & Whltworth, J.A (1987) Eponymes~ Park Ridge, N J: Parthenon Publ. Group Lid Humphreys, B L (1989) Umfied Medical Language System" Progress report. NLM News, 44, 6-7 Internahonal Classification of Diseases, Ninth Chmcal Modification (1978). Commission on Professional and Hospital Actlvmes, Michigan Kerkhof, P.LM (1987) Dreams and reahUes of computer-assisted diagnosis systems m medicine (editorial) Automedwa, 8, 123134 Kerkhof, P.L M (1992a) Woordenboek der geneeskunde N/E & E/ N, Bohn Stafleu Van Loghum, Houten, The Netherlands Kerkhof, P.L M. (1992b) Medical language and knowledge base systems Automedtca, 14, 47-54 Kerkhof, P L M, Helder, J, van Dleljen-Vlsser, M.P, Schreuder, J J.A., de Bruin, H.G, & Gill, K (1990) Evaluation ofchmcopathological conferences using a computer-supported diagnos~s program with matrix structure. Automedtca, 13, 45-53
Kerkhof, P L M, Koops, H, van Duke, C P.H, De Sera, J P, & Camphulsen, C (1983). Medwlse- A systematic medical data base for general use as a vade-mecum In J H van Bemmel, M J. Ball, & O. Wlgertz, (Eds) Medmfo 83 (pp 604-607) Amsterdam North-Holland Kong, A., Barnett, G.O, Mosteller, F, & Youtz, C (1986) How medical professionals evaluate expressions of probabdlty. New England Journal of Medwme, 315, 740-744 Ledley, R S, & Lusted, L B. (1991). Reasoning foundations of medical diagnosis MD Computmg, 8, 300-315 Mazoue, J.G. (1990). Diagnosis without doctors. Journal of Medtcme and Phdosophy, 15, 559-579 McCray, A.T, & Snnlvasan, S (1990) Automated access to a large medical dictionary, on-hne assistance for research and apphcat~on m natural language processing Computers tn Btomedwal Research, 23, 179-198 MESH, Medical Subject Headings (1987) NaUonal Library of Medicine, Bethesda, USA Miller, R., Masane, F E., & Myers, J.D. (1986) Qmck medical reference (QMR) for diagnostic assistance MD Computmg, 3, 3448 Miller, R A, Pople, H E, & Myers, J D (1982) INTERNIST-I, an experimental computer-based diagnostic consultant for general internal medicine New England Journal of Medwme, 307, 468476 Murphy, E A (1976). The logt~ ofmedwme Balhmore, MD Johns Hopkins University Press Pauker, S G , et al. (1976) Towards the slmulaUon of chmcal cogration American Journal o/Medicine, 60, 981-996 Schlffman, D O (1989). Medical mformahcs at the AMA, computer oriented biomedical nomenclature IEEE EMBS l lth International Con[eren~e (pp 1839-1840), Seattle, WA Warner, H R, Haug, P, Bouhaddou, O, Lincoln, M., Warner, H R Jr, Sorenson. D, Wdhamson, J W, & Fan, C (1988) ILIAD as an expert consultant to teach dlfferentml diagnosis SCAMC. 473480, Washington, DC Waxman, H S, & Worley, W E (1990) Computer-assisted adult medical diagnosis' SubJect rewew and evaluation of a new microcomputer-based system Medicine, 69, 125-136 Wdhams, B T (Ed.) (1982) Computer aids to chmcal decisions Boca Raton, FL' CRC Press Inc Yu, V.L. (1983) Conceptual obstacles in computerized medical diagnosis Journal of Medwme and Phdosophy, 8, 67-83