HepatoConsult: a knowledge-based second opinion and documentation system

HepatoConsult: a knowledge-based second opinion and documentation system

Artificial Intelligence in Medicine 24 (2002) 205±216 HepatoConsult: a knowledge-based second opinion and documentation system H.P. Buschera,*, Ch. E...

95KB Sizes 0 Downloads 26 Views

Artificial Intelligence in Medicine 24 (2002) 205±216

HepatoConsult: a knowledge-based second opinion and documentation system H.P. Buschera,*, Ch. Englera, A. FuÈhrera, S. Kirschkea, F. Puppeb

a

Clinic for Internal Medicine II, DRK-Kliniken Berlin-KoÈpenick, Salvador Allende Strasse 2-8, 12559 Berlin, Germany b Department of Arti®cial Intelligence and Applied Informatics, WuÈrzburg University, Germany Received 20 June 2001; received in revised form 7 October 2001; accepted 10 October 2001

Abstract HepatoConsult is a publicly available knowledge-based second opinion and documentation system aiding in the diagnosis of liver diseases. The positive results of a prospective diagnostic evaluation study encouraged its use in clinical routine, although the available hardware infrastructure was not optimal. The comments of the physicians who used the system con®rmed the results of the study and showed that the time for data entering is acceptable and the implicit standardization of terminology and documentation is welcome. Suggestions for improvement included the interface to enter data more easily, the scope to be usable for more patients and the additional capability to generate medical reports from the data. # 2002 Elsevier Science B.V. All rights reserved. Keywords: Knowledge-based system; Diagnosis; Second opinion; Clinical use; Evaluation; Hepatology; Gastroenterology; HepatoConsult

1. Introduction The explosion of medical knowledge leads to increasing specialization, which in turn implies more demand for consultation for diagnostic and therapeutic problems. Knowledge gained from general textbooks is often not easily adaptable for individual cases. Therefore, specialists must be consulted. However, they are often not available. This gap can be partially ®lled by knowledge-based systems [20]. In contrast to textbooks, they offer

*

Corresponding author. Tel.: ‡49-30-3035-3319; fax: ‡49-30-3035-3321. E-mail address: [email protected] (H.P. Buscher). 0933-3657/02/$ ± see front matter # 2002 Elsevier Science B.V. All rights reserved. PII: S 0 9 3 3 - 3 6 5 7 ( 0 1 ) 0 0 1 0 4 - X

206

H.P. Buscher et al. / Artificial Intelligence in Medicine 24 (2002) 205±216

consultation based on speci®c patient data. Although the basic technology is well understood, there are barriers to overcome for being useful in clinical routine: 1. The costs for developing and maintaining the system must be low. 2. The quality of the consultation must be high and the type of consultation should complement the capabilities of the physician. 3. The time for entering patient data must be low, which can be achieved by various strategies:  Computer administered data, in particular basic patient and lab data, should be transferred to the knowledge-based system automatically.  Appropriate hardware in the clinic must be easily accessible by physicians.  The questionnaires of the knowledge-based system should be as short as possible, i.e. not necessarily systematically structured, but rather goal-directed.  The system should allow the user to volunteer data and be able to competently reason with whatever data is available.  The data should be incorporated in the patient record and utilized to generate other documents, e.g. medical reports. 4. The data should be usable for statistical and scienti®c purposes. In the following, we discuss HepatoConsult, a knowledge-based system designed to aid in the diagnosis of liver and biliary tract disease, with respect to the questions above and ®rst experiences in clinical use. The system was built by the ®rst author of this paper without a knowledge engineer, who is usually needed for coding the medical knowledge. This economic process was enabled by the diagnostic shell kit D3 [15,17] offering a comfortable completely graphical knowledge acquisition interface, where the experts enter their knowledge directly by ®lling in various kinds of forms, tables and graphs. The knowledge is then translated automatically in the internal formal representation. While graphical knowledge acquisition is no substitute for the cognitive model how to represent knowledge, it facilitates (together with a concise explanation component) the technical part of knowledge entering, testing and modifying to a degree, where a knowledge engineer is not necessary as mediator between expert and computer [7]. However, a knowledge engineer participated in creating and revising the cognitive model, andÐmore importantÐ discussed with the expert, how to match the cognitive model to the mechanisms D3 offers (and even extending the capabilities of D3, if necessary). The cognitive model combines Clancey's staged model of heuristic classi®cation [6] for diagnostic interpretation, where raw data is aggregated to data abstractions being used for inferring diagnostic categories and ®nal diagnoses, with a mixture of hypothetico-deductive and establish-re®ne strategies for guiding the data gathering process. For dealing with uncertainty, scoring schemes for data abstractions and diagnoses with thresholds for uncertainty categories like probable, possible, unclear, improbable are used instead of a probabilistic or fuzzy approach. Since the evidencevalues estimated by experts are quite coarse in nature, a simple but easily comprehensible reasoning scheme seemed suf®cient. It is designed to be extendible to more sophisticated uncertainty approaches, because it uses abstract evidence categories instead of numbers. The effort for building knowledge bases depends strongly on its size, because building and maintaining larger high quality knowledge bases is disproportionately more expensive.

H.P. Buscher et al. / Artificial Intelligence in Medicine 24 (2002) 205±216

207

On the other hand, small knowledge bases cannot cover a suf®cient clinical breadth for being really useful. HepatoConsult takes a medium approach, being much more restricted than systems comprising whole internal medicine like QMR, Dxplain, Illiad or Meditel (for a thorough evaluation study, see [3]), but signi®cantly more general than very specialized systems like lab data interpretation systems being now often in routine clinical use like, e.g. the HEPAXPERT [1] or systems built with the Pro.M.D. shell [22]. We envision the construction of a competent system for internal medicine by a multiagent approach [25], where agents with different competence areas consult each other when solving a case. The concept and experiences from ®rst experiments involving HepatoConsult and similar medical diagnosis systems in different areas of internal medicine (e.g. CardioConsult [14] or in rheumatolgy [19]) are reported in [2]. For the moment, the ``medium size approach'' of HepatoConsult has the advantage that its competence spectrum is large enough for being useful in the clinical routine, but small enough for being able to achieve a high diagnostic quality (see Section 3) with reasonable effort. As far as we know, other consultation systems for liver diseases are less detailed like, e.g. the HEPAR System [9,10], which contains about 80 disorders as opposed to 234 in HepatoConsult. In Section 2, we report on structure and function of HepatoConsult, in Section 3 on a formal evaluation of the diagnostic quality of the consultation, and in Section 4 on issues of the clinical use. Section 5 presents a discussion and an outlook on future work and Section 6 concludes with a summary. 2. Structure and function HepatoConsult is a knowledge-based system aiding in the diagnosis of liver diseases publicly available on a CD in the general book trade [4] and running on standard PCs (WINDOWS-95 and higher). It integrates informal and formalized knowledge and has been built with D3 which contains different modules with different knowledge types for the main diagnostic tasks, i.e.  guiding data gathering: asking the right questions in the right sequence,  data abstraction: summarizing and abstracting low level primary data to so called ``data abstractions'' or ``symptom interpretations'',  diagnostic interpretation: inferring diagnostic conclusions from the data (abstractions). Data gathering is done with questionnaires corresponding to examinations and tests. They consist of a list of one-choice, multiple-choice or numerical questions. The current questionnaire is automatically selected and presented by the program, but can as well be chosen by the user. The answers can further be speci®ed by follow-up questions. The standard pathway leads from initial data to diagnosis via history, physical examination, lab data and sonography of the abdomen, but the system offers various shortcuts. Alternatively, the physician can enter suspected diagnoses and will be asked for examinations being necessary for validation or exclusion of the suspected diagnoses. HepatoConsult contains 87 questionnaires with 366 questions with an average number of

208

H.P. Buscher et al. / Artificial Intelligence in Medicine 24 (2002) 205±216

3.1 answer alternatives (i.e. about 1000 medical ®ndings). Knowledge on how to gather data comes in two forms: 1. Hypothetico-deductive: if a hypothesis is suspected, tests useful for its clari®cation are selected taking their costs/bene®ts ratio into account. Therefore, diagnoses may have a list of tests weighted according to their costs and bene®ts, where the best test based on the strongest current hypothesis is selected. HepatoConsult uses a total of 234 such diagnosis±test relations. 2. Compiled like in decision trees or other categorical rules: if certain states, e.g. observations or intermediate results, are detected, then certain data gathering procedures should be done routinely. HepatoConsult uses 212 rules for inferring follow-up questions and 62 relations for indicating tests based on intermediate states, e.g. to differentiate between the specializations of high-level diagnoses according to the establish-re®ne strategy. Data gathering terminates, if all established diagnostic categories are investigated and all suspected diagnoses are con®rmed respectively devaluated or if no further tests are available. Data abstraction is a proven technique to reduce the complexity of the knowledge base structure by preprocessing the raw data to diagnostic meaningful abstractions, e.g. summarizing the contribution of different examinations like history and status or tests like lab data for a state like hemochromatosis (e.g. lab-chemistry evidence for hemochromatosis being computed from the measurements of serum iron, ferritin, transferrin saturation and classi®ed in probable, possible or improbable). HepatoConsult contains 101 data abstractions with usually three values. Usually one rule suf®ces for inferring a value with a more or less complex condition (e.g. ``lab-chemistry evidence for hemochromatosis'' is ``probable'', if ``ferritin'' is between 500 and 1000 ng/ml and ``transferrin saturation'' > 75% and ``serum iron'' is ``high''). Diagnostic interpretation is greatly simpli®ed by data abstraction, but still quite complex. HepatoConsult uses a scoring system combining the different sources of evidence for or against each diagnoses. This is done by various more or less complex rules adding its evidence-values to the score of the diagnosis. The evidence-values are similar to the evoking strength and frequency values used in INTERNIST/QMR [12], i.e. there are six positive categories (p1±p6) and six negative categories (n1±n6). They are interpreted as numbers so that the diagnostic score is the sum of the negative and positive values and rated by three thresholds in one of four categories: probable (established), possible (suggested), unclear and improbable (excluded). Scoring schemes with thresholds are in widespread use in medicine because of the simplicity of the underlying cognitive model. Among other things they allow the experts to calculate the effects of entering and changing evidence values quite easily. This is an important pragmatic advantage in comparison to probabilistic or pseudoprobabilistic interpretations, because experts have more dif®culties to estimate probabilities and to get a feeling for combining them. Their main advantage, that the necessary probabilities are computed from case statistics, could not be realized, because we did not have enough cases especially for rare diagnoses. Due to individual thresholds for the diagnoses, the recognition of multiple diagnoses is possible (see Section 3). All conclusions are explainable by the

H.P. Buscher et al. / Artificial Intelligence in Medicine 24 (2002) 205±216

209

knowledge used inferring them as well as by the reasons, why rules were not applicable for inferring them. The diagnoses are represented in a multiple hierarchy (i.e. heterarchy) with the main category ``liver diseases'' and diagnostic categories from other medical subjects, which are needed as differential diagnoses. They are structured according to nosologic and pathophysiologic criteria. A total of 234 diagnoses organized in up to four hierarchical levels is used (e.g. Hepatitis±acute Hepatitis±Hepatitis A±presumably infectious Hepatitis A). They are rated with 1304 rules. The complexity of the rules varies:  from simple one feature one diagnosis rules (494), which are mainly used for negative evidence values (e.g. if the data abstraction ``sonographic evidence for liver cirrhosis'' is ``not likely'', then ``liver cirrhosis'' gets negative points ``n3''; see Table 1);  over rules combining several features with ``and'' ``or'' or ``n from m criteria'' (563 rules with an average of 3.5 features, e.g. if the data abstractions ``clinical evidence for liver cirrhosis'' is ``likely'' and ``sonographic evidence for liver cirrhosis'' is ``likely'' and ``duplexsonographic evidence for liver cirrhosis'' is ``possible'' or ``likely'' then ``liver cirrhosis'' gets positive points ``p6''; see Table 1);  to nested rule conditions, e.g. (A and B and (C or D) and (2 of 3 from E, F, G)). In HepatoConsult, 247 such nested rules are used for inferring diagnoses. Currently, the diagnostic interpretation uses only the heuristic problem solver of D3 (available are a categorical, heuristic, set-covering, case-based and statistical problem solver), but it is intended to add the case-based component when enough cases are available. For most diagnoses, a description including synonyms, de®nition, subtypes, frequency, clinical characteristics, tests, differential diagnoses, and therapies is available via an integrated standard HTML-browser. The descriptions are strongly linked to the expert systems and to each other and have been designed speci®cally for use via computer screens. Therefore, they are quite condensed in comparison to text books. A total of 151 diagnostic descriptions is available. At the end of a diagnostic session, the system presents a structured survey of the con®rmed and suspected diagnoses, for which the user can browse further information. For each suspected diagnosis, it also indicates necessary tests currently not performed. In addition, a structured summary of all answered questions can be shown on the screen or printed out for a paper-based patient record. Cases are stored in an integrated case base allowing basic statistical functions. For more elaborate analysis, the case-base can be exported in an XML representation and from there in a relational database. The degree of specialization, the size and the complexity of the system can be compared with CADIAG-2/Rheuma [8], which uses 170 diagnoses and 1126 symptoms with 16 040 simple (one symptom one diagnosis) and 60 complex relationships. However, the internal structure of both systems seems quite different: CADIAG-2/Rheuma uses mostly simple rules and a fuzzy model to smooth the effects of thresholds. HepatoConsult has lots of strict thresholds, but makes extensive use of complex rules and intermediate concepts to express the intuition, that the combined effect of several ®ndings for a diagnosis is often nonproportional compared to the sum of the individual effects of each ®nding (see Table 1). Although a comparison of the accuracy of both systems is dif®cult because of the different

210

H.P. Buscher et al. / Artificial Intelligence in Medicine 24 (2002) 205±216

Table 1 Example of the knowledge for inferring ``liver cirrhosis''a ‡ Suspected liver disease ˆ liver cirrhosis ‡ ‡ SI-liver cirrhosis, from complaints ˆ possible ‡ ‡ ‡ SI-liver cirrhosis, from clinical findings ˆ likely ‡ ‡ SI- liver cirrhosis, from clinical findings ˆ possible SI-liver cirrhosis, clinically and by technical investigations ˆ not deducible ‡ SI-encephalopathy ˆ stage 1 OR 2 OR 3 OR 4 ‡ ‡ SI-liver cirrhosis, by laboratory investigations ˆ possible OR likely ‡ Ammonium ˆ slightly elevated (55±80 mmol/l) OR moderately elevated (81±120 mmol/l) OR strongly elevated (>121 mmol/l) ‡ ‡ ‡ SI-liver cirrhosis, by sonography ˆ likely ‡ SI-liver cirrhosis, by sonography ˆ not likely ‡ ‡ SI-liver cirrhosis, by sonography ˆ possible ‡ SI-portal hypertension, by sonography ˆ likely ‡ ‡ ‡ SI-liver cirrhosis, by duplexsonography ˆ possible OR likely ‡ SI-liver cirrhosis, by duplexsonography ˆ not detectable ‡ SI-portal hypertension, by endoscopy ˆ possible OR likely ‡ SI-liver cirrhosis, by laparoscopy ˆ possible ‡ SI-liver cirrhosis, by laparoscopy ˆ likely SI-liver cirrhosis, by laparoscopy ˆ not detectable Liver histology ˆ fibrotic transformation ‡ Liver fibrosis, by histology ˆ strong fibrotic transformation as typical for cirrhosis ‡ Liver punction ˆ specimen broken into small pieces ‡ Budd±Chiari syndrom ˆ likely v 3322113412& & & v & & & & Connection type of the column (`v' ˆ or, `&' ˆ and, `a b' ˆ min a and max b of the items)

Evidence value for liver cirrhosis (p ˆ positive, n ˆ negative; higher number ! stronger evidence)

p6 p6 p5 p4 p4 p3 p4 p4 p4 p3 p3 n3 n4 n5

‡

‡ ± ±

v

v

n5

n6

Additional rule with nested condition: If and

Liver palpation ˆ enlarged SI-hemochromatosis; by laboratory investigations ˆ possible OR likely or SI-Wilson disease; by laboratory investigations ˆ possible OR likely OR very likely then liver cirrhosis gets evidence value p3 a One column corresponds to one rule, where `‡' resp. ` ' means that the condition in the text ®eld on the left side is present, resp. absent (e.g. the ®rst column means the rule: If ``SI-liver cirrhosis, by laparoscopy ˆ likely'' or ``Liver ®brosis, by histology ˆ strong ®brotic transformation as typical for cirrhosis'' then ``liver cirrhosis'' gets the evidence value of ``p6''). Seventeen rules are used to infer liver cirrhosis: ®ve simple rules, eleven combination rules and one nested rule at the bottom. The second, third and fourth rule have the same three preconditions, but they differ in the evidence value depending on whether all three, exactly two or just one precondition are/is present. If the sum of the score for liver cirrhosis is more than p5, it is probable, if it is between p3 and p5, it is possible and if it is less than n5, it is improbable, otherwise unclear. Combining evidence values for the rules is done via symbolic arithmetic: p3 ‡ p3 ˆ p4, p4 ‡ p4 ˆ p5, etc. and p3 ‡ n3 ˆ 0, etc. All symptoms beginning with ``SI'' are data abstractions (symptom interpretations) being inferred from other data by other rules.

H.P. Buscher et al. / Artificial Intelligence in Medicine 24 (2002) 205±216

211

nature of the domain, both systems show a high degree of competence in their ®eld (see next section). 3. Diagnostic evaluation HepatoConsult has been evaluated in a prospective pilot-study with 106 patient cases from two clinics: 57 cases from the ambulance of Freiburg University Clinic and 49 cases from the clinical station of DRK-Kliniken Berlin-KoÈpenick [5]. Cases with non-hepatological main diagnoses were excluded, but not those with non-hepatological supplementary diagnoses. The spectrum of ®nal diagnoses in the cases included: liver cirrhosis with various Child classi®cations (32), alcohol related liver dysfunction (30), hepatitis C (14), fatty liver (13), portal hypertension (12), cholelithiasis (8), auto immune hepatitis (6), liver involvement with bacterial or viral infection (6), hepatic encephalopathy with various characteristics (4) and a lot of further diagnoses with less than four patient cases. In 18 cases, the ®nal diagnosis remained unclear which was summarized under the term ``unclear hepatopathy''. The remaining 88 cases contained 162 diagnoses due to an average of nearly two diseases per patient. The system evaluation took into account not only the ®nal result, but also two intermediary steps: the current diagnoses after history and physical examination and the current diagnoses after adding basic lab data and sonography. The ®nal diagnoses were based on the results of case-speci®c further technical investigations and tests, which are often quite speci®c for the diagnoses in question. The intermediate steps are important for evaluation, because a good hypothesis is needed to guide the further data gathering process (a bad score in the intermediate steps would also violate the requirement, that the system should be able to behave reasonably with whatever data is available). The two results (see Tables 2 and 3) use two gold standards for indicating: (1) the frequency, how often the true ®nal clinical diagnoses were contained in the system's diagnoses, and (2) the plausibility of the system's diagnoses judged by four expert hepatologists with respect to the data available. The results clearly show a high diagnostic quality of HepatoConsult, which is also emphasized by the fact, that HepatoConsult offered in 56% of the cases a more detailed Table 2 Frequency of correct diagnostic system conclusions as proven by gold standard investigations and the ®nal diagnostic opinion of hepatologically trained physiciansa Gold standard: Final clinical diagnoses

Number

Final clinical diagnoses being contained in the system diagnoses after history and physical examination

Final clinical diagnoses being contained in the system diagnoses after adding lab and sonography data

Final clinical diagnoses being contained in the final system diagnoses

(a) Main diagnosis (b) All diagnoses

88 162

53 (60%) 83 (51%)

75 (85%) 127 (78%)

82 (93%) 135 (85%)

a

The table shows how often the system found the true ®nal clinical diagnoses with respect (a) to the main diagnosis only and (b) to all diagnoses.

212

H.P. Buscher et al. / Artificial Intelligence in Medicine 24 (2002) 205±216

Table 3 Percentages of adequate diagnostic proposals of the system at different stages of the diagnostic process as judged by four expert hepatologists using four categories of relevancea Gold standard: judgement of four expert hepatologists

System diagnoses plausible after history and physical examination (%)

System diagnoses plausible after adding lab data and sonography (%)

Final system diagnoses plausible (%)

Essentially correct Reasonable Questionable Severely wrong

93 (144 from 154) 4 (6 from 154) 3 (4 from 154) 0

88 (192 from 219) 8 (17 from 219) 5 (10 from 219) 0

86 (212 from 247) 7 (17 from 247) 7 (18 from 247) 0

a

The judgement based on the actual data input.

differentiation of the diagnoses than the responsible physicians (a more thorough discussion of the results is given in [5]). The percentages in Tables 2 and 3 should be compared with the results of an accompanying preliminary test of 18 physicians (2/3 with training in gastroenterology, 1/3 not) with 30 hepatological cases (1/2 easy, 1/2 dif®cult). Eighty percent of the easy cases were solved correctly in average, but less than 50% of the dif®cult ones, where the trained gastroenterologists performed signi®cantly better than the other physicians. These ®gures indicate a need of diagnostic competence and encourage the tentative use of HepatoConsult in clinical routine. In 59 cases the end diagnoses of the system contained diagnoses not being mentioned by the clinicians before, thus demonstrating the system's property to protect against failing to notice symptoms, diagnostic deductions or plausible diagnoses. 4. Clinical use HepatoConsult has been used at the hospital of Berlin-KoÈpenick since the end of 1998. Initially, it has been mainly evaluated by four physicians with hepatological cases. For this paper, they were asked to comment on the system and its usefulness. We report ®rst their opinion with respect to the effort of data entry, then mention bene®ts and ®nally some suggestions for improvement and further development. As pointed out, the time for data entry can severely restrict the practical usefulness of the system. Therefore, the developer put strong effort in a workable data gathering strategy. This proved successful: The time for direct data entry takes 3±5 min for a new patient, which is comparable to the time needed for ®lling in a paper record. However, the transfer of data from a paper record to the computer questionnaire took in average 10±15 min, because additional time was needed to search in the paper record for the data asked by the computer. A prerequisite for the short direct data entry time is, that the user is accustomed to the program. This was achieved after entering at most 20 cases. With respect to the content of the questions, the physicians stated, that the program asked all information which were necessary for the diagnoses being actual in scope, while avoiding unnecessary questions. On average, about 50 questions with about three one-choice or multiple-choice

H.P. Buscher et al. / Artificial Intelligence in Medicine 24 (2002) 205±216

213

answer-alternatives are asked. The program generates a table-based documentation form, which is added to the patient record. Mentioned bene®ts of the use of HepatoConsult were:      

patient specific consultation; second opinion as additional guard against mistakes, especially for difficult cases; comprehensible recommendations; standardized terminology and standardized scheme for gathering findings; standardized documentation (print out) to insert in the patient record; availability of short text book entries to diagnoses.

In the eyes of the physicians who evaluated HepatoConsult, a special advantage is the possibility of the system to explain its diagnostic reasoning in an understandable form which means especially to show its diagnostic rules leading to the diagnoses presented. This was regarded as essential for the necessary con®dence in the system's diagnoses. They all stated that causal explanations are more inspiring con®dence than statistical argumentation so that they would prefer a system like this one more than a system based solely on statistical associations. However, diagnostic reasoning by comparing an actual case with cases of a case collection was mentioned. This diagnostic method can be of interest in cases with rare diagnoses or with dif®cult differential diagnoses. It needs large databases which we now are building up with SonoConsult, a successor program of HepatoConsult, so that we can soon get ®rst experience with it. The physicians expressed various suggestions for system improvement. They concerned mainly four areas: First, they wished the program to generate medical reports from the data and have the data available in an electronic patient record. Second, they wanted the screen for data entry to be more compact, so that they can enter information more easily independent from the dynamic program sequence (currently, there are various clicks and scrolling operations necessary for selecting self selected questionnaires, which is perceived as too long winded). Third, the scope of the system should be widened and include at least whole gastroenterology. Fourth, further development should consider an automatic take over of lab data and of results of other diagnostic systems such as SonoConsult. These suggestions were now considered for the further program development. Initially, a serious problem was the restricted hardware availability of only one computer in an inconvenient location, which rendered the evaluation dif®cult. But the ®rst results of the evaluation of HepatoConsult had been necessary for the further development strategy and the cooperation of the clinic's computer-networks specialists. Meanwhile HepatoConsult is extended to MediConsult, a system for general acquisition of information from history and clinical investigation with diagnostic inference properties covering most diagnoses of Gastroenterology and Hepatology and a lot of diagnoses of General Internal Medicine. It is currently running on all computers located in the physician's meeting rooms which are now linked to the clinic's computer-network and evaluated for practicability, usefulness and acceptance. The sonographic part of HepatoConsult is developed to SonoConsult, a special tool for complete acquisition of results of sonographic investigations and coupled to an automatic text editor so that they can be presented in a well readable form and integrated in medical records.

214

H.P. Buscher et al. / Artificial Intelligence in Medicine 24 (2002) 205±216

The experiences by the use of these successor systems con®rm those of the ®rst evaluation of HepatoConsult. The main advantages concern  a better and more complete investigation and documentation of results, and  a more complete presentation of further diagnoses (the systems are able to present multiple diagnoses). Therefore, HepatoConsult and especially its successor programs may be used for quality control. Quality control, the prospect on statistical analysis of patient data and the facilitation of work by automatic text generation enhanced the acceptance of the evaluating physicians very much. These positive aspects and perspectives make further development worth while. 5. Critiquing system option Currently, the system is mainly used for those patient cases, the diagnoses of which are not obvious but contain probably liver diseases. A lot of advantages, including statistical evaluations and acting as safeguard against overlooking symptoms or diagnoses can only be achieved if the clinical data of all patients become available to the system. Besides widening the scope of the system as already mentioned, the system must be able to make reasonable comments with whatever data is available. In particular, if given basic data, main complaints and lab data on the data side and suggested tests, diagnoses and therapies on the solution side, the system should be able to check, whether the proposed solutions are reasonable with respect to the data. This is the domain of critiquing systems [21] introduced by Perry Miller more than 10 years ago [11]. A critiquing system has been de®ned as a ``decision support system that allows the user to make the decision ®rst; the system then gives its advice when the user requests it or when the user's decision is out of the system's permissible range'' [23]. Flexible critiquing systems are based on diagnostic knowledge to compare its own derived actions with those of the physician and need only little additional knowledge, in particular  categorical knowledge about guidelines and typical user errors;  knowledge about the relevance of diagnoses (e.g. treatability, danger, urgency) and possible actions (e.g. costs, risks and benefits);  knowledge for the analysis of the reliability of a diagnostic conclusion for a given case taking into account the reliability and completeness of data, reliability of its diagnostic knowledge and the explanation of the diagnostic conclusion. The concept for extending a diagnostic consultation to a critiquing system is presented in more detail in [16]. A similar system type are case-based training systems, where the system presents a patient case to the user, who has to decide his further actions, i.e. additional tests, diagnoses and therapies. The main difference to critiquing system is, that the solutions to the cases of a training system are known in advance, which does not hold in critiquing systems. Therefore, the latter must be much more cautious in its conclusions. While critiquing systems are preferred by experts, training systems are most useful in the last phases of medical education, in particular in the context of the new paradigm of

H.P. Buscher et al. / Artificial Intelligence in Medicine 24 (2002) 205±216

215

problem-based learning [13,24]. D3 allows to reuse a knowledge base for generating training systems. Field experiences with hepatological and other medical training systems are reported in [18] and show, that this is also an attractive mode of use of formalized knowledge. 6. Lessons learnt The following statements summarize our experiences with knowledge-based systems in clinical use against the background of previous approaches with mostly limited success:  Consultation programs need not only a high quality of their recommendations, but also a not too narrow competence field. HepatoConsult, therefore, is being extended now to cover whole gastroenterology with graceful degradation at its borders.  The concept of integrating text book knowledge in HepatoConsult with many links between the symptoms, tests and diagnoses of the formalized and the informal knowledge has proven useful and will be maintained.  In clinical use, it is necessary that consultation programs make available the gathered data for documentation purposes and vice versa. Automatic text generation and a tight integration with an electronic patient record is the key to broad acceptance. Unfortunately, the progress usually can be achieved only in small steps and a long way is still ahead.  The use of patient data for generating medical reports has been perceived as very useful. Besides a standard report, a tailored report usable for generating detailed physicians letters is available for already 70% of the information and its extension is a primary task. Because this task is quite time-consuming for the experts, a special editor for entering and testing the generated letters is under development to speed up this process.  Quality control with respect to case-adequate complete collection of general and disease-related information from history, clinical investigation and technical tests as well as the choice of adequate technical investigations and the presentation of all possible diagnoses will be an excellent argument for using such systems.  The possibility of differentiated statistical analysis of large case bases produced by the use of HepatoConsult, its successor programs or similar diagnostic tools will be of scientific interest. References [1] Adlassnig KP, Horak W. Development and retrospective evaluation of Hepaxpert-I: a routinely-used expert system for interpretive analysis of hepatitis A and B ®ndings. Art Intell Med 1995;7(1):1±24. [2] Bamberger S, Puppe F. Kooperiende Diagnoseagenten. Informatik Forsch Entw 1999;14(3):135±44. [3] Berner E et al. Performance of four computer based diagnostic systems. N Eng J Med 1994;333:1792±6. [4] Buscher HP, HepatoConsultÐHepatologisches Second-Opinion-Programm. Wiesbaden: Ullstein-Medical; 1998. [5] Buscher HP, FuÈhrer A, Kirschke S, Galland D, Spangenberg H, und Blum H. Evaluation von HepatoConsult, einem hepatologischen ExpertensystemÐerste Ergebnisse und Trends. Dtsch med Wschr 1999;12:989±92.

216

H.P. Buscher et al. / Artificial Intelligence in Medicine 24 (2002) 205±216

[6] Clancey W. Heuristic classi®cation. Artif Intell 1985;20:215±51. [7] Gappa U, Puppe F, Schewe S. Gra®cal knowledge acquisition for medical diagnostic expert systems. Art Intell Med 1993;5:185±212. [8] Leitich H, Kiener H, Kolarz G, Schuh C, Graninger W, Adlassnig KP. A prospective evaluation of the medical consultation system CADIAG-II/RHEUMA in a rheumatological outpatient clinic. Methods Inf Med 2001;40:213±20. [9] Lucas P. Re®nement of the HEPAR expert system: tools and techniques. Art Intell Med 1994;6:175±88. [10] Lucas P, Janssens A. HEPAR: an expert system for the diagnosis of disorders of the liver and biliary tract. Liver 1989;9:266±75. [11] Miller P. Expert critiquing systems. Heidelberg: Springer; 1986. [12] Miller R, Pople H, Myers J. INTERNIST1, an experimental computer-based diagnostic consultant for general internal medicine. N Eng J Med 1982;307/8:468±76. [13] Moust J, Bouhuijs P, Schmidt H. Problemorientiertes Lernen. Wiesbaden: Ullstein Medical; 1997. [14] Puppe B, Riecker G. CardioConsult: Kardiologisches Second-Opinion-Programm. Wiesbaden: UllsteinMedical; 1998. [15] Puppe F. Knowledge reuse among diagnostic problem-solving methods in the shell-kit D3. Int J HumComput Stud 1998;49:627±49. [16] Puppe F. Meta knowledge for extending diagnostic consultation to critiquing systems. In: Proceedings of EKAW-99: The 11th European Workshop on Knowledge Acquisition, Modeling, and Management (Dagstuhl Castle, Germany, May 26±29). Heidelberg: Springer; 1999. p. 367±72. [17] Puppe F, Gappa U, Poeck K, Bamberger S. Wissensbasierte Diagnose- und Informationssysteme. Heidelberg: Springer; 1996. [18] Puppe F, Puppe B, Reinhardt B, Schewe S, Buscher HP. Evaluation medizinischer DiagnostikExpertensysteme zur Wissensvermittlung. Informatik. Biometrie und Epidemiologie in Medizin und Biologie 1998;29(1):48±59. [19] Schewe S, Schreiber M. Stepwise development of a clinical expert system in rheumatology. Clin Invest 1993;71:139±44. [20] Schwartz W. Medicine and the computer: the promise and problems of change. N Eng J Med 1970;283:1257±64. [21] Silverman B. Survey of expert critiquing systems; practical and theoretical frontiers. Commun ACM 1992;35(4):106±27. [22] Trendelenburg C, Colhoun O, Wormek A, Massey K. Knowledge-based test result interpretation in laboratory medicine. Clin Chim Acta 1998;278:229±42. [23] van Bemmel J, Musen M. Handbook of medical informatics. Heidelberg: Springer; 1997. [24] Wetzel MS. Problem based learning: an update on problem based learning at Harvard Medical School. Ann Community-Oriented Edu 1994;7:237±47. [25] Wooldrigde M, Jennigs NR. Intelligent agents: theory and practice. Knowledge Eng Rev 1995;10(2):115±52.