10 Expert systems in obstetrics and gynaecology

10 Expert systems in obstetrics and gynaecology

10 Expert systems in obstetrics and gynaecology T. C H A R D A. SCHREINER The collection, storage and retrieval of information are the major applica...

2MB Sizes 7 Downloads 355 Views

10

Expert systems in obstetrics and gynaecology T. C H A R D A. SCHREINER

The collection, storage and retrieval of information are the major applications of computers in medicine. However, the machine is also capable of being used to draw conclusions from complex data. Once a computer system goes beyond being a simple passive receptacle for information it acquires characteristics which, because they emulate some aspects of human thought, are often described as 'expert systems' or 'artificial intelligence'.

EXPERT SYSTEMS Expert systems have been defined as the 'embodiment within a computer of knowledge from an expert skill in such a form that the system can offer intelligent advice or take an intelligent decision' (modified from a formal definition by a specialist subgroup of the British Computer Society). However, this simple definition begs the question of where expertise begins and ends. Consequently, it has been proposed that expert systems can be classified into three different types.

Type 1 expert systems A type i expert system involves skills that might be expected of an intelligent human being who has no specific professional training. This includes quite simple computer operations such as the error checks that can be applied to data entry. For example, the message '123456 CANNOT BE A NAME. PLEASE RE-ENTER' might seem trivial by the standards of what is often regarded as an arcane subject, but nevertheless provides a perfectly sound emulation by the machine of a human response to incorrect information.

Type 2 expert systems A type 2 system involves skills that would be expected of a professional with specific training in the given topic. This is the type that most people would recognize as an 'expert system' and in clinical medicine would include information or decisions that would not be immediately obvious to a layperson. Bailli~re's Clinical Obstetrics and Gynaecology-815 Vol. 4, No. 4, December 1990 Copyright © 1990, by Baillibre Tindall ISBN 0-7020-1479-6 All rights of reproduction in any form reserved

816

T. C H A R D A N D A . S C H R E I N E R

Type 3 expert systems A type 3 expert system involves simple but repetitive operations which can be carried out by a human but only at the cost of considerable time and effort. An example is on-line data reduction from a cardiotocograph. In theory a human could sit down with pencil, paper and calculator and work out the cumulative total of the various events in several hours. In practice this would not be possible and it is only the availability of on-line data reduction (a type 3 expert system) which makes the approach feasible. Another example is the evaluation of acid-base balance in neonatal intensive care: this presents a series of superficially simple calculations which are much better performed by machine (Schreck et al, 1986). There are many situations in clinical medicine in which the information may be fairly clear-cut but the conclusions that may be drawn from it (i.e. the diagnosis) are not. Under these circumstances the physician's decision is based on probability rather than certainty. Estimating probability in traditional clinical medicine is the ill-defined process often referred to as intuition. However, probability is usually defined in mathematical terms; type 3 systems can provide the often quite complex mathematical solutions to this sort of problem, the most familiar example being the use of Bayes' theorem.

ARTIFICIAL INTELLIGENCE Artificial intelligence (AI) can be defined as 'behaviour by a machine which would be regarded as intelligent if it were performed by a human'. This could involve any of the expert systems noted above. However, whereas an expert system is defined by its output (i.e. intelligent advice), AI is defined by its mode of reasoning (i.e. handling of symbolic logic in a manner similar to the human brain). The characteristics of an AI system may also often include the abilities of self-learning and of explaining the reasons for a given conclusion. It may also be defined as an expert system which is capable of developing itself from simple inputs. Indeed, the so-called expert system shells are capable of assimilating large numbers of simple rules and then deducing complex relationships between them. The term artificial intelligence is often broadened to include functions such as vision and hearing. But these are best placed in the separate class of 'artificial perception'. These functions are perfectly well developed in lower species and would not be emulated in a medical diagnostic system.

ARTIFICIAL PERCEPTION Hearing and vision are universal human functions, almost regardless of the level of intelligence, but are formidably difficult to emulate with a computer. There is little chance that in the foreseeable future (i.e. in this century) there will be any routine, practical system of two-way verbal exchange between

EXPERT SYSTEMS

817

patient and machine. Similarly, although machine analysis of complex images is a powerful tool in investigative medicine, it does not extend to the sort of pattern recognition by which any human being can tell that the patient is in pain. CLINICAL DIAGNOSIS BY AN EXPERT SYSTEM

What is a diagnosis? A diagnosis is a word or set of words which describes the meaning of a group of clinical features; these may be symptoms, signs, results of special investigations and pathological examination, or some combination of these. In effect a diagnosis is a brief summary of a case which classifies it with other cases having a similar basic pathology. In practice, the diagnosis serves as a convenient label on the basis of which all subsequent management will be based. In some cases diagnoses may be very simple and the presenting feature may be the same as the final conclusion. Thus a sensible patient who states 'I have a retained coil' may well have exactly that: the presenting feature and the diagnosis are identical and all a machine can do is to record the facts of a situation. Usually, however, a diagnosis is a conclusion based on a multiplicity of features. In addition, much of the information collected in the early evaluation of a case is non-definitive; for example, the complaint of 'intermenstrual bleeding' might be compatible with a retained coil or with several alternative diagnoses. Almost all clinical diagnoses represent a conclusion of more or less certainty reached from a multiplicity of clinical features, none of which, on their own, is definitive.

'Absolute' diagnosis versus operational diagnosis It is obvious that the aim of clinical medicine is to achieve the best possible diagnosis in every case. Usually this means the diagnosis with the highest degree of likelihood after exclusion of all other possibilities. In reality most diagnoses fall short of this optimum. The patient reporting to a general practitioner with a missed period will be asked simple questions and offered a pregnancy test. The fact that this could be the first sign of a pituitary turnout is unlikely to be pursued at this stage. The operational diagnosis is 'pregnancy'. In clinical practice most conclusions are operational diagnoses of the type described above. But the designers of medical expert systems (often a committee of designers) are often tempted towards the absolute, and the result may be procedures that are theoretically sound but impractical. Computer professionals are often unaware that a medical diagnosis may exist at a variety of different levels. A clinical expert will know not just what facts to obtain but also when to stop acquiring information. It must also be recognized that most clinical information is non-definitive, i.e. the findings may be associated with a number of different diagnoses. It is unusual to encounter findings which are pathognomonic (providing absolute con-

818

T. CHARD AND A. SCHREINER

firmation of diagnosis) or exclusionary (absolutely ruling out a diagnosis) or obligatory (absence of the finding rules out the diagnosis). Synthetic versus analytical reasoning in medical diagnosis

A highly important aspect of the emulation of human performance by a computer is the route by which a conclusion is reached. The two main approaches can be defined as 'analytical' and 'synthetic' (see also Elstein et al, 1978; Macartney, 1987). With the analytical approach a series of facts is collected and then assembled to draw a conclusion. With the synthetic or 'cue-hypothesis' approach a conclusion is reached at an early stage and facts are then sought which might confirm or refute that conclusion*. In the analytical approach a patient with amenorrhoea would be asked all relevant questions and subjected to all relevant examinations, before a diagnosis was made. The synthetic approach is well illustrated by a real-life physician faced with a patient with amenorrhoea: the doctor may immediately have a 'hunch' that this is a pregnancy and will then confine the initial questioning and investigation to that possibility. In practice, clinical diagnosis usually follows the synthetic approach, although it is the analytical approach that is taught in medical schools. The important point about this distinction is that computer diagnosis almost always follows an analytical approach--the meticulous accumulation of facts in the absence of preconceived ideas or interim hypothesis. The great advantage of a machine is that it can follow complex but predetermined paths of data collection, and then draw accurate conclusions. But designers of computer diagnosis systems often attempt to emulate the synthetic intuitive approach. It is soon found that this requires immensely sophisticated equipment capable (like the human brain) of parallel operations; such equipment, if available at all, is impractically expensive. The problems of trying to emulate humans with computers have been well illustrated by the analogy of flight: a real bird may be better designed than a fixed wing aircraft, but the latter provides a highly successful means of transport. CLINICAL EXPERT SYSTEMS: METHODS FOR COMPUTERASSISTED MEDICAL DECISION-MAKING

The different approaches to computer-assisted medical decision-making (CMD) systems have been reviewed in detail by Reggia and Tuhrim (1985). The classification suggested by them will be followed here. The basic components of any CMD system are shown in Figure 1. The same structure can be used to classify the humans associated with a CMD system. The physicians who provide the knowledge base are the 'medical experts' or 'knowledge base authors'; the computer scientists who design the inference engine are 'knowledge engineers'; and the physicians who apply the system * Human memoryrestrictsconsiderationof hypothesesto four ± one at any time (Elstein et al, 1978).

819

EXPERT SYSTEMS

Knowledge base

1 Facts about a patient

Inference engine

Conclusions about a patient

Figure 1. Diagram of a computer-assisted medical decision-making (CMD) system (modified from Reggia and Tuhrim, 1985). The input is a description of the clinical features of a specific patient; the output is some sort of conclusion, which may be a diagnosis and/or suggestions for additional tests and treatment. The knowledge-base is a collection of predetermined 'facts' needed to solve problems in a given subject. The inference engine is a program which assembles the information about the patient together with that in the knowledge-base and draws a conclusion about the patient (modified from Chard, 1988). are the 'users'. T h e various m e t h o d s used have also b e e n divided into a n u m b e r o f g r o u p s (Table 1).

Table 1. A classification of methods used in CMD systems (from Reggia and Tuhrim, 1985). Algorithmic methods Statistical pattern classification Production rule systems Cognitive models

Algorithmic methods A n algorithm is simply a sequence of instructions on h o w to p e r f o r m a task. M o s t successful current C M D systems are based on this approach. T h e obvious advantages are that it is simple to design and easily i m p l e m e n t e d using s t a n d a r d c o m p u t e r languages ( F O R T R A N , B A S I C , Pascal). F o r example, in B A S I C the instructions are a series of statements on successive lines, with conditional branches to o t h e r lines where appropriate. T h e conclusions (diagnostic or therapeutic) are r e a c h e d by one or m o r e ' i f . . . then' statements: 'if' the patient has clinical features X and Y, but not Z, ' t h e n ' the diagnosis is p r o b a b l y condition A and the t r e a t m e n t is drug 1 (Figure 2). In systems of this type the inference engine is, in effect, the p r o g r a m language.

820

T. CHARD AND A. SCHREINER

1 1

/eatureY/ 1

+

iagnosiA//s

Figure 2. A diagram to illustrate the algorithmic approach to a computer-assisted medical decision-making (CMD) system. Information is obtained on two clinical features, X and Y. If both are present (positive) then the conclusion of diagnosis A can be reached. If either or both features are negative the program requests another input (feature Z). In real life, flow charts of this type are usually of much greater size and complexity. Nevertheless, they are easy to implement using standard languages and equipment (modified from Chard, 1988).

The algorithmic systems are simple and attractive but are not especially popular amongst researchers in CMD. Among the quoted disadvantages are the lack of precise quantitative formulae for many medical problems, and the absence of a clear distinction between the knowledge base and the inference engine. However, the general popularity and success of algorithmic systems will continue until a system of equal practicality and superior abilities is available. An example of a decision-support system in obstetrics using an algorithmic approach has been reported (Lilford and Chard, 1984). Statistical pattern classification

The best known of these procedures are Bayes' theorem, linear discriminant functions and database comparisons.

EXPERT SYSTEMS

821

Bayes' theorem The mathematical basis of this very well known approach to CMD was first shown by an eighteenth-century clergyman, Thomas Bayes. Bayes' theorem presents information as probabilities--which explains the great value of the theorem in clinical medicine, in which most probabilities are less than one (i.e. there is hardly ever total certainty). The theorem combines the prior probabilities of outcomes together with the conditional probabilities of various input features in order to reach a posterior probability or conclusion. In practical terms, Bayes' theorem can be used to estimate the probability of various diagnoses given the clinical features of an individual patient. This process of drawing probabilistic conclusions from multiple pieces of information, none of them definitive on their own, describes very well the process of clinical diagnosis. It has even been called the machine equivalent of human intuition (Chard, 1987). The equation of Bayes' theorem commonly appears in the following form:

P(D: CF) -

P(CF: D)P(D) P(CF: D)P(D) + P(CF: not D ) P ( n o t D)

(1)

where P is probability, D is diagnosis, and CF is a set of clinical features. Thus the expression on the left, P(D : CF), is equivalent, in words, to 'the probability P of diagnosis D given the set of features C F . On the right P(D) and P ( n o t D) represent the prior probabilities of the diagnosis and its alternatives. In a typical clinical case there would be a number of possible conditions and a number of clinical features. A probability calculation is made for each condition; the 'diagnosis' is the condition with the highest probability. This equation is the 'inference engine' (see Figure 1) of a Bayesian CMD system. The knowledge-base for a given clinical situation consists of a matrix of all possible conditions and all clinical findings in these conditions. Each cell in the matrix contains the incidence of a particular finding in the given disease

(P(CF:D)). The concept of Bayes' theorem is not easy for a non-mathematician to grasp at a brief glance, but it can be illustrated by a simple worked example. Table 2 shows the incidence of two clinical features, a and b, in two diseases, Table 2. The incidenceof clinicalfeatures, a and b, in two diseases, X and Y. Incidence of feature (%) Disease

a

b

X Y

10 90

90 10

The two diseaseshave an equal incidence; for the purposes of the example shown in the text it is assumed that X and Y are mutually exclusive and that a and b are independent.

822

T. CHARD AND A. SCHREINER

X a n d Y, which have an equal incidence (i.e. the prior probability, P(D), of each is 0.5). Take a case in which both features, a and b, are positive and work out the posterior probability P(D:CF), of each diagnosis. For diagnosis X this would appear as follows:

P ( X : CF) -

P(a :X)P(b :X)P(X) [P(a:X)P(b :X)P(X) + (P(a : loP(b : IOP(Y ) ]

(2)

Note that the expression on the right-hand side Qf the bottom line, giving the probabilities in respect of diagnosis Y, is in this case the equivalent of'not D' in equation 1. Substitution of the actual numbers in equation 2 gives: 0.1 x 0.9 x 0.5

P(X: CF) - (0.1 x 0.9 x 0.5) + (0.9 x 0.1 x 0.5) = 0.5 It is obvious that the same conclusions would be reached in respect of The solution to this particular example might be thought so obvious that it would not require recondite mathematics. However, the complexity increases in situations in which there are multiple possible diagnoses with different frequencies and multiple clinical features. The answer then becomes less obvious and the mathematical approach correspondingly more valuable. As a final example take a case in which a is positive but b is not. Then:

P(Y: CF).

P(X: CF) -

P(a :X)P(not b :X)P(X) (3) [P(a: X)P(not b: X)P(X)] + [(P(a: IOP(not b: loP(Y)] 0.1 x 0.1 x 0.5 (4) [0.1 x 0.1 x 0.5] + [0.9 x 0.9 x 0.5] = 0.012

This answer, though factually correct, is not so obvious as that of the previous example.

Practical aspects of the application of Bayes' theorem Certain rules guide the use of Bayes' theorem in clinical medicine: 1. 2.

3.

Bayes' theorem cannot give a definitive and absolute diagnosis (or if it does, it is being used incorrectly). All it does is to assign a probability to each of the possible conditions. The possible conditions must be mutually exclusive. For example, 'vaginal discharge' and 'candida' in the same list would not be acceptable, whereas 'candida' and 'trichomonas' would be acceptable. An individual patient may of course have multiple conditions; Bayes' theorem cannot easily help with this, other than by the fact that it never excludes a diagnosis. The clinical features should be independent of each other. For example, the urinary symptoms of 'urgency' and 'frequency' are so closely linked that it would not be appropriate to consider them as separate items.

EXPERT SYSTEMS

823

However, it is often difficult to establish that given features are truly independent and even more so to quantitate the degree of dependence. Fortunately the results of Bayesian calculations are not drastically altered by some measure of dependence. Adjusted weightings (derived from logistic regression) can be used to take into account test interdependence. . The clinical features must be binary, i.e. present or absent. This can be achieved even for numerical information, simply by applying a cut-off point, for example 'diastolic blood pressure above 90 mm Hg--yes or no'. Similarly the outcomes may refer to a single condition with some subclassification, for example 'good', 'fair' or 'poor' results from an operation for stress incontinence. As with all CMD systems, the efficiency of Bayes' theorem depends closely on the accuracy of the knowledge-base. 6. Bayes' theorem often gives the best results with a subset of the total diagnostic information. In a study on a CMD system for jaundice, for example, it was found that only 22 of a total of 107 variables originally collected were required to achieve optimum results (Malchow-Moller et al, 1986). Incorporation of too many variables may actually decrease the efficiency of the procedure. .

Linear

discriminant

functions

The mathematical basis of linear discriminant functions is complex but the principle can be illustrated by a simple example. The probability of the diagnosis 'thyrotoxicosis' varies with the presence or absence of clinical features such as heart rate (tachycardia), heat intolerance, diarrhoea, weight loss etc. Considering only tachycardia, the likelihood of thyrotoxicosis can be expressed as: the quicker the heart rate, the more likely is thyrotoxicosis. This can be expressed as: y = a~ + b~x~ (5) where y is a number related to the probability of thyrotoxicosis, x I is the heart rate per minute and a I and bl are constants which can be calculated from a cohort of thyrotoxic patients. (b~ can be positive or negative; it is positive here, but would be negative in hypothyroidism.) Next, consider the presence or absence of diarrhoea. This can be described by the equation: y = a 2 + b2x 2 (6) where x 2 is 1 if diarrhoea is present and 0 otherwise, and a 2 and b~ are constants. The risk of thyrotoxicosis for both features can be described by: y = a~+2 + b x l + b~x~

(7)

or more generally as: y = a + b~x~ + b~x 2 . . .

+ bx

(8)

where a is a constant and b~x, describe the linear or binary influence of the various clinical features, y is a number, the 'discriminant function', which

824

T. CHARD AND A. SCHREINER

can be used to divide a population into two or more groups with one or more cut-off values. Each group has a specific risk of disease or probability of a diagnosis. Because a and b i are calculated from real-life data they must be corrected for chance variation in the sample. The final formula is: Y ~

"~ ~ i

-}- ~2~2 ~- " " ° ~nXn

(9)

131is usually referred to as the q3-weight' of the individual features. Unlike Bayes' theorem, linear discriminant functions do not require independent variables. However, if interdependent variables are present, their 13-weights are highly unstable. Dawes and Corrigan (1974) examined the significance of this by replacing the t3-weights first with randomly generated weights, and then with unit weights. The linear models with unit weights worked best. In a similar vein, Elstein et al (1978) succinctly stated: 'Just identify the "big variables" and add'. In actual use, therefore, linear discriminant functions are very easy to use and have much the same features as Bayes' theorem. Like Bayes', they cannot give an absolute diagnosis, but can only assign probabilities to the conditions under consideration.

Database comparisons In this approach a new patient is compared with similar previous patients in a clinical database (Feinstein et al, 1972; Fries et al, 1974; Okada et al, 1977; Haberman et al, 1985). The principle is illustrated in Figure 3. When the new patient has been identified with a subset of patients in the database, the characteristics of the subset (diagnosis, treatment) become the conclusions of the new case. This method is classified as statistical rather than absolute because the match of new patient to database is often imperfect and must therefore be based on the nearest rather than a perfect match. For this

a

b

c

d

X

+

+

+

+

Y

+

+

Z

+

+

+

+

Figure 3. Diagram to illustrate the principle of clinical diagnosis by database comparison. Four clinical features (a-d) define the presence or absence of three diagnoses (X, Yand Z). A patient positive for all four features would therefore have diagnosis X. In practice the situation is far more complex because 'presence or absence' is usually a probability rather than a certainty. The database would contain numerous examples of X, Y and Z, and the result would be presented as that diagnosis which most frequently presented the same features as the current case (modified from Chard, 1988).

825

EXPERT SYSTEMS

purpose a 'distance' is estimated between the patient and the database. The inability to find an exact match between an individual patient and a knowledge-base is dealt with in a branch of mathematics known as 'fuzzy set theory' (Zadeh, 1968). CMD systems based on database comparisons have the advantage that they make no assumption as to the independence or otherwise of the clinical features. They also provide hard information, as opposed to opinions, on the frequency estimates. The disadvantage is the requirement for a very large database, especially if it is to contain representative examples of rare conditions. Furthermore, the system cannot be used until the database has been accumulated at considerable effort and expense. The depth and extent of the database search places heavy demands on the computer hardware. Production rule systems Production rule systems come into the domain which many professionals would regard as 'artificial intelligence'. In such systems each production/rule has the form 'IF antecedents THEN consequents': if the antecedent conditions are true then the consequents are also true (Figure 4). The inference

I Input [ Age 35

J Rules If parity more than 5 then ... If weight more than 80 kg then ... If age less than 16 then ... If age 35 or 36 then ... If Negro race then ... Etc

Figure 4. Example of a production rule from a computer-assisted medical decision-making (CMD) system providing advice on screening tests in pregnancy. Note that rules of this type are n o t branching points in a program but are simple statements of fact (modified from Chard,

1988). engine is an interpreter which examines sets of rules in the knowledge-base and matches them to the features of a particular case. It is not always immediately obvious how this differs from a simple algorithmic system. The answer resides in the terms 'procedural' and 'nonprocedural'. An algorithmic system is procedural and follows predetermined paths. In the example shown in Figure 5 an algorithmic system would call for an input of 'age?'; the next step would be a conditional branch 'If age > 35 then recommendation' or 'If age > 35 then next step'. A production rule system, by contrast, is non-procedural. Following an input of age (say 36) the whole of the knowledge-base is searched to ascertain if there are any rules triggered by this information. If such rules are found, the consequents are displayed; if they are not, the program continues to the next step (Figure 6).

826

T. CHARD AND A. SCHREINER

RULES/DECISION NODES

i

If age>35 THEN .......

]

¢ ',","i'"," If parity>5 THEN ....... I FACTS

¢

co,c.os.o,s l

If weight > 30 THEN .......

I

If Negro race THEN ....... Figure 5. Algorithmicsystem. If the systemis at a decisionnode that requires the input 'age', it can continue only after the 'age' has been provided. If other informationis giventhe systemwill come to a halt because there is a predetermined sequence which m u s t be followed. The difference between algorithmic and production rule systems can be further illustrated using the example of prenatal counselling of the type shown in Figures 5 and 6. For the user, entry of age will yield the same recommendation from both systems. However, entry of 'red hair' would bring an algorithmic system to a halt because there would be no procedural step of the 'If red hair t h e n . . . ' type. By contrast, a production rule system would search its knowledge-base, find that there were no rules concerning red hair, and simply pass on to the next step. The advantage of a production rule C M D system is that rules of the type shown in Figure 4 can be added more or less indefinitely. The major disadvantage is that, in a typical clinical system, large numbers of rules will have to be searched after each and every entry. Large numbers of rules are

RULES/DECISIONNODES I If age>35 THEN ....... If parity>5 THEN ....... I

FACTS

/

,\ t%

If weight>3Okg THEN ....... 2 ~

If Negro race THEN .......

CONCLUSIONSI

!

Figure 6. Production rule system. Of all the possible routes through the system, the inference

engine chooses the one most appropriate to a given input. If no relevant input is available, the system will simplycontinue with the next step.

EXPERT SYSTEMS

827

essential to allow for the context of the information, for example the rule in Figure 4 would n o t be true if the patient were seeking termination of pregnancy on social grounds, or was more than 20 weeks' pregnant. In reality it can be much easier and quicker for the human expert to determine the context, or that 'red hair' is not relevant, and therefore exclude it from inputs, rather than to go on a long and complex search which may prove fruitless. A further disadvantage is that medical knowledge tends to be organized in a descriptive manner: reformulation into a full set of discrete rules can be a formidable task. There are several special features of production rule CMD systems: 1.

Forward chaining versus backward chaining: the systems described so far are all forward chaining (also referred to as antecedent driven, bottom up or data driven). They start with the features of a case and work forward to conclusions (Figure 7). This can just as easily work in reverse and the system is then described as backward chaining (consequent driven, top down, goal directed). A conclusion is selected and the data that might confirm or disprove that conclusion are sought (Figure 8). This is very similar to 'synthetic' reasoning in the human. • •





"

CONCLUSIONS

RULES •

• o •



Figure 7. Forward chaining. From the facts available, all possible conclusions are searched for and derived.

I ;ACTS• .o

RULES

1 CONCLUSION •

L~_ •

Figure 8. Backward chaining. The system searches only for the data it needs to confirm or disprove the one conclusion under consideration.

2. 3. 4.

Rules are usually assembled as a 'chain' or decision-tree in order to make multi-step deductions (for example, in a system for leukaemia diagnosis; Alvey et al, 1987a). Rules may be absolute, or may include an element of uncertainty in the form of probabilities or, as in the case of MYCIN (see below), confidence values. Rules, once triggered, can easily be made to document themselves, thus providing a 'self-explanatory' function.

Some of the most familiar of the earlier CMD systems were of the production rule type, for example MYCIN for the choice of antibiotic therapy

828

T. C H A R D A N D A . S C H R E I N E R

(Shortliffe, 1976; Davis et al, 1977), ONCOCIN for cancer therapy, and PUFF (Aikins et al, 1983) for the interpretation of pulmonary function tests. A feature of the knowledge-base of some production rule systems is causal reasoning. In this approach the knowledge is in the form of a network of diseases and clinical findings, together with the causal relationships among these which may include cause-of, caused-by, develops-into and complication-of. Any disease can be described as a pattern within this network. In addition the causal knowledge is arranged in different hierarchies, for example one hierarchical level would describe the involvement of individual organs, another would be at the tissue level. Causal representation is said to enable a basic understanding of a disease process. An example of such a system is CASNET/glaucoma (Weiss et al, 1978). However, it is extremely difficult to understand, to construct and to program. It is sometimes thought that production rule systems emulate human intelligence by virtue of the fact that they have a large store of knowledge and the ability to search through it rapidly. But in reality this 'brute force' approach is not a good imitation of the way humans process information. The speed of the physical events underlying the function of the human brain is much slower than that of computer circuits, yet a human can solve many everyday problems in a fraction of the time taken by a machine.

Cognitive models Cognitive models are an attempt to model, as closely as possible, the reasoning of human diagnosticians. In order to construct such a model a clinical interview is recorded and the clinician is then debriefed as to the thought process involved. This process always confirms that human diagnostic reasoning consists of a series of hypothesize-and-test steps during the course of which the physician constructs a conceptual model of the patient (Figure 9). At an early stage of the clinical process there are usually multiple competing hypotheses (the differential diagnoses of the case). The aim is to reduce the number of diagnoses to the minimum, often one, which will best explain the patient's problems (this reduction process is sometimes referred to as Occam's razor). At all stages in the process the physician attempts to grade a piece of information or a hypothesis according to its usefulness or likelihood. Interestingly the human mind usually employs a three-point system for ranking--in effect, 'yes', 'no' or 'perhaps' (Elstein et al, 1978). So a diagnosis may be highly probably (say 0.95-1.0 on a probability scale), highly improbable (say 0.00-0.05) or somewhere in-between; the probability of the in-between category could be as broad as from 0.05 to 0.95 without any indication of a strong weighting one way or the other. In a cognitive model the knowledge-base consists of 'frames' of associated information; this is thought to be a reflection of the organization of memory in the human. A single frame might consist of the name of a test, the diseases in which it is positive or negative, the hazards and the costs. The inference

EXPERT SYSTEMS

829

EMULATION OF HUMAN REASONING I

AcceptedHypothesis ,'j ,,,

Test Hypothesis

[

lnitial Facts

1

(More) Information

System Knowledge

I

Figure 9. Diagram to illustrate the principle of the cognitive model approach to a computerassisted medical decision-making (CMD) system. As the result of input of information the physician forms a hypothesis about a patient (the hypothesis is a sort of intermediate, working diagnosis selected from a number of competing explanations of the patient's problems). Further information is then sought in order to strengthen or repudiate the hypothesis. At some, usually rather indeterminate point, the process is concluded and the final hypothesis becomes the actual diagnosis. It has been estimated that this occurs when the clinician has assembled about 70% of the potentially useful data (Feightner, 1975) (modified from Chard, 1988).

engine of a cognitive model is based on a hypothesize-and-test approach, i.e. cycling through sets of frames until a conclusion is reached. An approach that fits into the cognitive model group is the 'problem knowledge coupler' (Weed and Hertzberg, 1984; Pollock, 1986). In this system the clinician seeks a principal finding problem ('pivotal sign') and then considers all the causes of that problem, ranking them according to how well they account for other patient findings and attributes. The disadvantage of cognitive models is that they can become very complex if they emulate human ability to deal with multiple simultaneous hypotheses. This type of parallel reasoning has led some to conclude that similar structures will be needed in medical computer hardware, at a cost proportional to the nth power of the number of simultaneous processes. Another problem is the very obvious differences in thought processes encountered among human experts of equivalent ability. This is most unlikely to be built into a CMD system. Neural networks

An area of AI research which has been the subject of much recent publicity is the use of neural networks or neurocornputers (Reggia, 1988; Stubbs, 1988). These are systems composed of a network of neurone-like processing devices. Parallel computations are performed by each element, so that collectively the network provides enormous computational power. Feedback between elements ensures that the system is capable of learning. This is achieved by altering the strength of the connections between processors according to the results obtained. In addition, the network acts as both memory and processor, unlike conventional AI systems (see Figure 1) in which these functions are distinct.

830

T. CHARD AND A. SCHREINER

The equipment for a neural network already exists in the form of devices such as the connection machine which has many thousands of simple processors, each with its own memory, embedded in a communication network. Software is the major current limitation: most programs for biomedical work are at the research stage. Pattern recognition for imaging and signal processing are thought to be important potential uses. A very topical research area is the use of these machines to model neurophysiological events in the brain, especially the architecture of the visual cortex (Linsker, 1986). But construction of a computer equivalent to the whole human brain is likely to prove more difficult since the brain is a network of 5 x 10 l° neurones (processors) with 1014 synapses (communication channels). T H E PRESENT PLACE OF ARTIFICIAL I N T E L L I G E N C E AI has not had a good press in the recent past. This is partly because of the over-optimism of some exponents of the topic. However, there is little doubt that some of the concepts of AI can make a major practical contribution to clinical medicine (Szolovits et al, 1988). There is equally no doubt that for the foreseeable future AI will not replace the human professional (Schwartz et al, 1987). A major criticism of AI is that its concepts are often wrapped in obscure terminology. Other problems include: 1.

2.

Opinions on AI are often coloured by experience with earlier systems: MYCIN and others were brilliant prototypes but for practical purposes are now of historical interest only. The transition from a prototype demonstrator system to a practical high-performance system can present many problems, especially that of making gross errors in complex and difficult cases. This has been well illustrated in an E M Y C I N system for leukaemia diagnosis (Alvey et al, 1987b).

DEFINING THE K N O W L E D G E - B A S E FOR C M D SYSTEMS

An essential part of setting up a CMD system is to define and produce the knowledge-base. The perfect knowledge-base would contain all the known and possible features of a clinical situation, together with the quantitative relationships amongst those features*. For example, in the case of a 30-year-old woman with vaginal discharge the knowledge-base would contain a list of possible * Knowledge can often be divided into two, three, or more levels. Examples include the division into declarative and procedural knowledge (declarative knowledgedescribes objects and events; procedural knowledge describes actions and relationships) or the division into terminological knowledge (knowledge about definitions and language), domain-descriptive knowledge (medicaltextbook knowledge), and problem-solvingknowledge(knowledgeabout problem solving, diagnosisor therapy) (Swartout and Smoliar, 1987).

EXPERT SYSTEMS

831

diagnoses and the frequency of each, together with the frequency of each clinical feature in each diagnosis. But complete information of this type is only rarely available in any field of clinical medicine. Indeed for most situations only a small percentage of the information would be available in usable numerical form. There are four principal approaches to the construction of a knowledge base: 1.

2.

3.

4.

Use of an existing database. This might be information derived from published data in a textbook or journal. The problem is that the information is often incomplete, particularly in respect of the frequency of various clinical features; furthermore, the information may not apply to the local population for which the system is designed. Development of a new database based on local practice. This is the best answer if it is available. Thus, if complete information is available on the previous 1000 30-year-olds with vaginal discharge then it is fairly simple for the computer to compare the pattern of features in the current subject with that in previous subjects with a known diagnosis. Unfortunately, the creation of such a database takes much time and resources before yielding a useful product. This situation will improve with the increasing trend towards electronic collection of clinical data. The vast databases collected from some existing expert systems, such as INTERNIST, are of great value in their own right. Development of a database as part of the actual use of an expert system. The machine may develop its own knowledge base from data acquired as part of its use in routine clinical practice. But again the knowledge base will not become effective until substantial numbers of cases have accumulated, and it may be difficult to assess when this point has been reached (Chard, 1987). The 'Delphic' system. In this approach the necessary knowledge is generated by seeking the opinions of 'experts'. It is fairly rapid and simple, but often reconfirms the well-known fact that experts vary widely in their opinion of numerical information that superficially would appear to be very straightforward and familiar (Dolan et al, 1986). In addition, physicians are often unfamiliar with the concept of probability estimates (Kahneman and Tversky, 1972; Tversky and Kahneman, 1974); a program has been described to train physicians in this technique (Fryback, 1986).

In practice, construction of the knowledge-base for a CMD system is usually a combination of all these approaches. The knowledge engineer dealing with the condition of vaginal discharge would first examine textbooks and list all possible diagnoses and the clinical features associated with each of these diagnoses. He or she would also ascertain the numerical facts that are readily available in the literature. The information would then be assembled into a table which would, of course, have numerous blank spaces. At this point the engineer would confer with a human expert--a gynaecologist--and ask for estimates for all the missing information. If the expert cannot give a single figure (e.g. 40%) then they are asked to suggest a range (e.g. 30-50%).

832

T. C H A R D A N D A. S C H R E I N E R

Whatever range is suggested, the median is taken as the actual value. The validity of the process may be enhanced if several experts can be interviewed so that the end-result is a consensus rather than an individual view (Schreiner and Chard, unpublished data). The knowledge-base as a diagnostic support system An expert system knowledge-base can be of great value as a diagnostic aide-memoire without itself drawing any conclusions. Examples of this include QUICK and ASK*MED (Bernstein et al, 1980). ASK*MED consists of an extensive and detailed clinical text. On the basis of a query this text is scanned for all associated items and paragraphs, ranked according to their relevance to the query. This type of system does not aim to provide a model of clinical judgement but rather to provide a compact, up-to-date representation of medical knowledge. The design of systems of this type has become much simpler with the introduction of automatic text processors such as Apple 'Hypertext'. DOMAIN-INDEPENDENT CMD SYSTEMS (EXPERT SYSTEM SHELLS) Most well-known CMD systems were originally developed for a specific application, e.g. MYCIN for antibiotic therapy. Some were further developed into 'system shells' or domain-independent systems which provide all the relevant software tools and then allow the user to construct an application by simply adding the knowledge base, e.g. EMYCIN (Essential MYCIN). Systems of this type are often limited because they allow only one approach. It is desirable to allow for a number of different approaches within the same application, for example a series of categorical production rules followed by a Bayesian probability analysis of a set of data. Several of these 'shells' are commercially available and could be used for medical application. Some of the more recent ones even allow different approaches to problem-solving (e.g. Bayes' and production rules within the same shell). Choosing between these products is very confusing. Furthermore, some high-level languages have been advocated for use in expert systems, notably PROLOG and LISP. It is interesting to note that current implementations of two classical LISP-based systems. PUFF and INTERNIST, have been rewritten; INTERNIST has become the Pascal-based QMR, and PUFF has been rewritten in BASIC. It has been suggested that the use of 'shells' can create problems: in particular, the inaccessibility of the inference engine makes it impossible to influence the reasoning process (Alvey et al, 1987a; Schreiner and Chard, 1990). Alvey and colleagues (1987a) switched from EMYCIN to PROLOG during the construction of an expert system for the classification of leukaemia by cell surface markers, because they found it easier to analyse the reasoning process in PROLOG. The great advantage of the shells is that they provide a user-interface, a database, and an inference engine. This is very helpful to those without skills or experience in programming.

EXPERT SYSTEMS

833

SOME G E N E R A L RULES FOR THE DESIGN OF M E D I C A L EXPERT SYSTEMS

There are certain pragmatic rules which should guide the development of any expert diagnostic system: 1. 2. 3. 4. 5. 6. 7. 8.

The system should attempt to find a single diagnosis which accounts for all the data. The system should be capable of partitioning if it becomes apparent that there is more than one diagnosis. The system should proceed from the general to the specific, i.e. having identified the broad area of a problem it should focus on smaller subsets. The system should be capable of eliminating a diagnosis; this is often simpler than making a diagnosis and can be just as helpful. The items of information that have the greatest differential diagnostic value should be sought first. The system should recognize that common events are common, and that an encyclopaedic list of differential diagnoses can be counterproductive. The system should n e v e r miss a treatable disease. The system should never leave important factors unexplained.

PRESENTING THE FINDINGS OF A C M D SYSTEM

The format for presentation of the conclusions of a CMD system varies widely. The simplest system presents only the most likely diagnosis. At the other extreme the machine can give a full list of rank-ordered diagnoses, together with the exact probabilities of each and a formal statement of the factors that were taken into account in reaching these conclusions. Between these extremes are systems that present a diagnosis or diagnoses classified simply as probable or possible. This more closely emulates the normal human approach and thus is likely to be the most favoured (Chard et al, 1989). P E R F O R M A N C E A S S E S S M E N T OF A C M D SYSTEM

Each of the approaches to medical expert systems described above has its own band of dedicated and enthusiastic supporters. However, studies that attempt to compare the performance of different CMD systems generally agree that there is little to choose between them. Reggia and Tuhrim (1985) have advised that methods should be chosen according to the specific problem, in particular to accord with the format in which the knowledge is already encoded. There are, for example, some well-documented sets of clinical rules that !end themselves to a production rule system and which can therefore be used with very little 'translation'. Similarly, there are some problems that demand a probabilistic solution of the Bayesian type (for example, prognosis of strokes or heart attacks) while others involve categorical inferences (for example, staging of cancer).

834

T. C H A R D A N D A. S C H R E I N E R

The criteria by which any CMD system should be assessed are accuracy, usefulness, transferability and acceptability.

Accuracy The accuracy (or efficiency) of any diagnostic system is assessed by determining the rates of true and false positive results, and true and false negatives. From these the sensitivity, specificity and predictive value are calculated and can be compared with the figures derived from some 'gold standard' of correctness. Often this would be the conclusion reached by an expert given the same information. This leads to the problem that the CMD system may reach a conclusion based on exact probability while the physician's conclusion will be qualitative. The computer might decide that diagnosis A is correct because it has a probability of 0.75 while the next most likely diagnosis, B, has a probability of 0.5. A human might reach the same conclusion, but it would be in the form 'A is the most likely diagnosis and B is the next most likely', i. e. with no estimate of the closeness of the probabilities. Humans are usually more confident about the accuracy of their guesses than is justified by the real probabilities (Kleinmuntz and Elstein, 1987). Humans also tend to ignore negative information in favour of positive facts which confirm their initial hypotheses. It is also possible to compare CMD systems by examining the probabilities reached in respect of conditions for which there is some absolute measure of outcome. Take, for example, ovulation. The final arbiter of this diagnosis will be the progesterone level. However, all the information obtained prior to blood sampling can be combined in a CMD system. If one system gave a probability of 0.8 in the actual presence of ovulation while another system gave 0.6, then clearly the first system is more 'correct' than the second. This is not the end of the story, of course, because the first system might give large numbers of false positive results. Several studies have compared the accuracy of machine diagnosis with that of a human faced with the same data. The machine usually proves to be equivalent or somewhat superior to the human. The best known of such comparisons, that of de Dombal and colleagues on abdominal pain, is described in more detail below. Other examples include jaundice (Malchow-Moller et al, 1986), genital infections (Chard, 1987), acid-base balance (Schreck et al, 1986), head injury (the Glasgow Coma Scale; Barlow et al, 1987) and myocardial infarction (Goldman et al, 1988). The machine is especially strong in type 3 expert systems--those involving simple but multiple calculations. This has been particularly well documented in the evaluation of fetal cardiotocographic tracings (Kariniemi, 1978; Dawes et al, 1985).

Usefulness A medical expert system must offer some advantage over humans. Although machines and the human experts are equivalent in terms of accuracy, the real strength of the machine is that it may be available when the human

EXPERT SYSTEMS

835

expert is not; under these circumstances there can be no doubt as to usefulness. A well-known example is the use of the de Dombal abdominal pain system (de Dombal et al, 1972) in US navy submarines. While on a mission these submarines can receive but not send messages. The use of a CMD system provides powerful reassurance to isolated medical corpsmen and obviates the danger and cost of surfacing unnecessarily. Other practical features of a CMD system include:

1. Speed. A system is of little value if it takes several minutes to digest or present a piece of information.

2. Authority. The system must reflect widely accepted opinions on a topic,

3.

not the idiosyncrasies of a particular designer. This is not a problem with prototype systems but becomes a problem when the subject matures and multiple competing systems become available. An example of a mature application is the computer-based interpretation of psychological tests, a topic which has been active since the 1950s (Hofer and Green, 1985; Moreland, 1985). Cost. CMD systems will almost always be an addition to other medical expenses, but can eventually generate savings by greater efficiency. For example, giving physicians the predicted probabilities of test abnormalities for commonly ordered laboratory procedures yields a significant reduction in patient charges (Tierney et al, 1988). A reduction in the need for hospital admission has been shown with the use of CMD for abdominal pain (Adams et al, 1986) and myocardial infarction (Goldman et al, 1988).

Transferability A medical expert system may not necessarily work on a site other than that where it originated. A system for the diagnosis of cases of jaundice developed at a specialized liver unit in London (Knill-Jones et al, 1973) was shown to be inadequate for use in a Swedish hospital (Lindberg, 1982). A subsequent study showed that another jaundice algorithm, this time developed in Copenhagen, worked well in a Swedish setting (Lindberg et al, 1987). In the UK, it has been clearly demonstrated that the de Dombal system for abdominal pain (de Dombal et al, 1972) works well on a variety of different sites.

Acceptability Some health care personnel may see the introduction ofa CMD system as an intrusion upon their professional domain and standing. There has even been the suggestion that computerized diagnostic aids are less welcome when services are reimbursed on a fee-per-item basis than in free or prepaid health care plans (Kleinmuntz and Elstein, 1987). However, professional objections are usually temporary. It is often found that the image of the physician may be enhanced by the use of sophisticated technology. At the stage of introduction CMD systems should always be advocated for decision support rather than decision making.

836

T. CHARD A N D A. SCHREINER

The use of a CMD system may also add steps (and therefore time) to the diagnostic process. This is true for many current systems, including those of the de Dombal group, which are separate from the data collection process, i.e. data is transferred to the expert system after the patient is seen. Double entry is a disadvantage and CMD systems are likely to succeed only when they become an integral part of the data collection process. SOME EXAMPLES OF CMD SYSTEMS IN ACTION

INTERNIST (Caduceus) INTERNIST is the most familiar of the expert systems said to be based on a cognitive model (Pople, 1977; Miller et al, 1982). Currently the knowledgebase includes about 575 diseases and more than 4000 individual manifestations of disease (demographics, history, symptoms, physical signs and laboratory data). The knowledge-base consists of profiles of diseases. The manifestations of the disease are listed: three numbers are given to each manifestation, on a scale from 1 to 5. The first number is the 'evoking strength' and addresses the question: 'On the basis of this finding alone, how strongly does the physician consider the patient has this disease compared with any other in internal medicine?' The second number is the 'frequency' and states that, given the disease, the manifestation occurs with a certain frequency (1 = rare or minimal; 2 = a significant minority of cases; 3 = about half of the cases; 4 = a significant majority of cases; and 5 = essentially all). The third number is for 'importance' and indicates whether it is necessary to explain the manifestation in presenting the diagnosis. The inference engine of INTERNIST-1 considers every disease which is compatible with a positive manifestation on the basis of the evoking strength numbers. Positive manifestations will therefore cluster in the most probable diagnoses. The values for each of the three numbers allow the diagnoses to be ranked. If the main contender is adequately supported, this conclusion becomes the diagnosis. If it is not, the system asks for additional observations. A subsequent development from INTERNIST-1 is the QUICK MEDICAL REFERENCE program (QMR) (Miller et al, 1986). This uses the same knowledge-base but, instead of acting as a diagnostic consultant, QMR is an information tool with which users can review the knowledge-base. This type of 'electronic textbook' is one of the most important practical contributions of many expert systems.

The de Dombai system for abdominal pain De Dombal and colleagues in Leeds pioneered the practical application of expert systems in medicine. In 1972 they described a system for the diagnosis of acute abdominal pain (de Dombal et al, 1972). Since then they have published regular improvements and updates. The most recent of these (Adams et al, 1986) provides a model of how such systems should be designed and assessed.

EXPERT SYSTEMS

837

The program is aimed at the evaluation of cases of abdominal pain of less than 1 week's duration. The data from each patient are subjected to a Bayesian probability analysis using a knowledge-base derived from 6000 patients in 13 countries. The latest publication describes an evaluation in eight centres with more than 250 participating doctors and 16737 patients. The performance in respect of diagnosis and decision-making was compared over a base-line period without computer assistance during a test period with the machine system. Initial diagnostic accuracy rose from 45.6% to 65.3%. The rate of negative laparotomies fell by half, as did the rate of perforation in patients with appendicitis. Serious management errors fell from 0.9% to 0.2% and the mortality rate fell by 22%. It was postulated that savings were made by the avoidance of 278 laparotomies and 8516 bed-nights during the trial period. This type of study leaves no doubt that CMD systems must play an increasing role in the practice of clinical medicine.

DXplain DXplain is one of the most recent and practical of diagnostic decision-support systems. It was developed in Boston in collaboration with the American Medical Association (Barnett et al, 1987). The system is available through AMA/NET and thus can be accessed by anyone with a terminal and modem. The knowledge-base includes descriptions of approximately 2000 diseases, about 4700 terms (signs, symptoms etc), and 65 000 relationships amongst these. The items are linked by relatively straightforward algorithms and selection rules based on a scoring system similar to a Bayesian analysis. The program can explain and justify its interpretations. Unlike INTERNIST, it can contradict as well as support a diagnosis. An important feature is ongoing collaboration with users. Comments and suggestions are continuously reviewed and the knowledge-base modified and enhanced accordingly.

The multi-centre Chest Pain Study The latest publication from the multi-centre Chest Pain Study described 4770 patients presenting with acute chest pain (Goldman et al, 1988). The computer had greater specificity (ability to predict absence of infarction) than physicians and a similar sensitivity (ability to detect presence of infarction). Decisions based on computer protocols alone would have reduced admission of patients without infarction by 11.5% while not affecting admission of patients who required intensive care. These findings show that the computer can correct the tendency for a physician to overestimate risk in cases with a low probability of acute ischaemia (McNutt and Selker,

1988). CONCLUSION Computer assistance with medical diagnosis--expert systems--are certain

838

T. CHARD A N D A. SCHREINER

to grow, in particular as part of computerized clinical data collection systems. The reasons for this .growth include the increasing size of the medical knowledge-base, the increasing specialization of medical practice, the availability of even more powerful computer technology, and a greater readiness of physicians to make use of computer technology. Expert diagnosticsystems must be readily accessible, easy to use, and authoritative. Many prototype systems fail to meet these simple criteria, explaining why CMD systems are not already an integral part of routine medical care. The new generations of systems, such as DXplain, which meet these criteria, will have much more impact. REFERENCES Adams ID, Chan M, Clifford PC et al (1986) Computer aided diagnosis of acute abdominal pain: a multicentre study. British Medical Journal 293: 800-804. Aikens JS, Kunz JC, Shortliffe E H et al (1983) PUFF: an expert system for interpretation of pulmonary function data. Computers in Biomedical Research 16: 199-208. Alvey PL, Myers CD & Greaves MF (1987a) High performance for expert systems: I. Escaping from the demonstrator class. Medical tnformaties 12(2): 85-89. Alvey PL, Preston NJ & Greaves MF (1987b) High performance for expert systems: II. A system for leukaemia diagnosis. Medical Informatics 12(2): 97-114. Barlow P, Murray GD & Teasdale G (1987) Outcome after severe head injury: the Glasgow model. In Corbett WA (ed.) Medical Applications of Microcomputers, pp 105-126. New York: Wiley. Barnett GO, Cimino JJ, Hupp JA et al (t987) DXplain: an evolving diagnostic decisionsupport system. Journal of the American Medical Association 258: 67--64. Bernstein L, Siegel ER & Goldstein CM (1980) The hepatitis knowledge base: a prototype information transfer system. Archives oflnternal Medicine 93: 169-175. Chard T (1987) Human versus machine: a comparison of a computer 'expert system' with human experts in the diagnosis of vaginal discharge. International Journal of Biomedical Computing 20: 71-78. Chard T (1988) Computing for clinicians. London: Elmore-Chard. Chard T, Schreiner A & Rubenstein EM (1991) Should the findings of a medical expert system be presented as the most likely diagnosis or as the exact probability of that diagnosis? (in preparation). Davis R, Buchanan B & Shortliffe E (1977) Production rules as a representation for a knowledge-based consultation program. Artificial Intelligence 8: 15-45. Dawes GS, Rcdman CWG & Smith JH (1985) Improvements in registration and analysis of fetal heart rate records at the bed side. British Journal of Obstetrics and Gynaecology 92: 317-325. Dawes RM & Corrigan B (1974) Linear models in decision making. Physiological Bulletin 81: 95-106. Dotan JG, Bordley DR & Muchlin AI (1986) An evaluation of clinician's subjective prior probability estimates. Medical Decision Making 6: 216-223. de Dombal FT, Leaper DJ, Staniland JR et al (1972) Computer aided diagnosis of acute abdominal pain. British Medical Journal ii: 9-13. Ellis E (1987) The artificial expert. In Medical Computing and Applications, pp 81-117. Chichester: Ellis Horwood, Elstein A, Shulman L & Sprafka S (1978) Medical Problem Solving: An Analysis of Clinical Reasoning. Harvard: Harvard University Press. Feightner M (1975) A comparison of the clinical methods of primary and secondary care physicians. Proceedings of the Annual Conference on Research in Medical Education, pp 482-496. Feinstein A, Rubenstein J & Ramshaw W (1972) Estimating prognosis with the aid of a conversational-mode computer program. Annals of Internal Medicine 76: 911-921.

EXPERT SYSTEMS

839

Fries J, Weyl S & Holman H (1974) Estimating prognosis of systemic lupus erythematosus. American Journal of Medicine 57: 561-565. Fryback D G (1986) A program for training and feedback about probability estimation for physicians. Computer Methods and Programming in Biomedicine 22: 27-33. Goldman L, Cook ER, Brand D A et al (1988) A computer protocol to predict myocardial infarction in emergency department patients with chest pain. New England Journal of Medicine 318: 797-803. Haberman HF, Norwich KH & Diehl DL (1985) DIAG: a computer-assisted dermatologic diagnosis system: clinical experience and insight. Journal of the American Academy of Dermatology 12: 132-143. Hofer PJ & Green BF (1985) The challenge of competence and creativity in computerized psychological testing. Journal of Consulting and Clinical Psychology 53: 826-838. Kahneman D & Tversky A (1972) Subjective probability: a judgement of representativeness. Cognitive Psychology 3: 430-454. Kariniemi V (1978) Evaluation of fetal heart rate variability by a visual semi quantitative method and by a quantitative statistical method with the use of a microcomputer. American Journal of Obstetrics and Gynecology 130: 588-590. Kleinmuntz B & Elstein AS (1987) Computer modelling of clinical judgement. CRC Critical Review in Medical Informatics 1(3): 209-228. Knill-Jones RP, Stern RB, Grimes DH et al (1973) Use of a sequential Bayesian model in the diagnosis of jaundice. British Medical Journal i: 530-533. Lilford RJ & Chard T (1984) The use of a small computer to provide action suggestions in the booking clinic. Acta Obstetrica et Gynaecologica Japonica 36: 119-125. Lindberg G (1982) Studies on diagnostic decision making in jaundice, pp 1-60. Thesis, Karolinska Institute, Stockholm. Lindberg G, Thomsen C, Malchow-Moller A et al (1987) Differential diagnosis of jaundice: applicability of the Copenhagen pocket chart proved in Stockholm patients. Liver 7: 43-49. Linsker R (1986) From basic network principles to neural architecture: emergence of spatialopponent cells. Proceedings of the National Academy of Sciences 83: 7508-7512. Macartney FJ (1987) Diagnostic logic. British Medical Journal 295: 1325-1331. McNutt R A & Selker HP (1988) How did the acute ischemic heart disease predictive instrument reduce unnecessary coronary care unit admissions? Medical Decision Making 8: 90-94. Malchow-Moller A, Thomsen C, Matzen P e t al (1986) Computer diagnosis in jaundice: Bayes' rule founded on 1002 consecutive cases. Journal of Hepatology 3: 154--163. Miller RA, Pople HE & Myers D (1982) INTERNIST-l, an experimental computer-based diagnostic consultant for general internal medicine. New England Journal of Medicine 307: 468-476. Miller RA, McNeil MA, Challinor SM et al (1986) The INTERNIST-l/QUICK MEDICAL REFERENCE project: status report. Western Journal of Medicine 145: 816-822. Moreland KL (1985) Validation of computer-based test interpretations: problems and prospects. Journal of Consulting and ClinicalPsychology 53: 816-825. Okada M, Maruyama M, Kanda T et al (1977) Medical data base systems with an ability of automated diagnosis. Computer Programs in Biomedicine 7: 163-170. Pollock RVH (1986) Computers as medical management tools: computer-assisted diagnosis and medical decisions support. Veterinary Clinics of North America 16: 669-684. Pople HE (1977) The formation of composite hypothesis in diagnostic problem solving, an exercise in synthetic reasoning. Proceedings of 5th Joint Conference on Artificial Intelligence, vol. 2, pp 1030-1037. Los Altos, California: M. Kaufmann. Reggia JA (1988) Artificial neural systems in medical science and practice. MD Computing 5: 4--6. Reggia JA & Tuhrim S (1985) Computer-assisted Medical Decision Making, pp 3-45. New York: Springer-Verlag. Schreck DM, Zacharias D & Grunau CFV (1986) Diagnosis of complex acid-base disorders: physician performance versus the microcomputer. Annals of Emergency Medicine 15: 164-170. Schreiner A & Chard T (1990) Expert systems for the prediction of ovulation: comparison of an expert system shell (EXPERTECH Xi Plus) with a program written in a traditional language. Methods of lnformation in Medicine 29: 140-145.

840

T. CHARD AND A. SCHREINER

Schwartz W, Patil RS & Szolovits P (1987) Artificial intelligence: where do we stand? New England Journal of Medicine 316: 685-688. Shortliffe EH (1976) Computer-based Medical Consultations: MYCIN. New York: American Elsevier. Stubbs DF (1988) Neurocomputers. MD Computing 5: 14-24. Swartout WR & Smoliar SW (1987) On making expert systems more like experts. Expert Systems 4(3): 196-207. Szolovits P, Patil RS & Schwartz WB, (1988) Artificial intelligence in medical diagnosis. Annals of Internal Medicine 11t8(1): 80-87. Tierney MD, McDonald CJ, Hui SL et al (1988) Computer predictions of abnormal test results. Journal of the American Medical Association 259: 1194-1198. Tversky A & Kahneman D (1974) Judgement under uncertainty: heuristics and biases. Science 185: 1124-1131. Weed LL & Hertzberg R (1984) Problem-solving: what's the best combination of man and machine? Computers: Medical Update 2: 4-16. Weiss S, Kulikowski C & Safir A (1978) Glaucoma consultation by computer. Computers in Biological Medicine 8: 25-40. Zadeh L (1968) Biological applications of the theory of fuzzy sets and systems. In Proctor L (ed.) Biocybernetics of the CentralNervous System, pp 199-212. Boston: Little Brown and Company.