Int J Biomed Comput, 27 (1991) 47-51 Elsevier Scientific Publishers Ireland Ltd.
41
‘SOLUBILE’: DECISION-MAKING IN THE DIAGNOSIS OF JAUNDICE
E.W.F.W. ALTON. N. NEWMAN, J. HOOPER, D. FOK and F.R. VICARY Department of Gastroenterology, The Whittington Hospital, Highgate Hilt, London N19 5NF (CLK.)
(Received June 11th. 1990) (Accepted September 9th, 1990)
We have designed a computer program ‘Solubile’ to aid clinicians in the diagnosis of jaundice. Based on Bayer? theorem, ‘Solubile’ uses up to 47 items of information about the patient to produce the most probable diagnosis from 22 possible diseases. In a prospective analysis of 50 patients, 74% were correctly diagnosed in first place and 94% within the first three choices. The possibility of using ‘Solubile’ at differing locations was tested by prospectively diagnosing 100 cases at a second centre having a significantly different patient population. 75% of these patients were correctly diagnosed in first place and 89% within the first three choices. The diagnostic ability of ‘Solubile’ was compared with that of 20 clinicians of various grades. The clinicians correctly diagnosed 49.5% of cases (Solubile 74%) and placed 68.5% (Solubile 94qo) within the first three choices. ‘Solubile’ will be of use to aid clinicians in aspects of the diagnosis and management of jaundice. Keywords: Diagnosis; Computer-aided jaundice
Introduction Many studies have assessed the possible use of computers to aid clinicians in diagnosis [ 11. Increasingly, attention has been focussed on the differentiation of a number of possible diseases given a particular presenting symptom. Early attempts in the diagnosis of jaundice included the application of Fischer’s discriminant function [2] and numerical taxonomy [3]. However, over the past two decades most studies have chosen Bayes’ theorem (or a modification) as the mathematical mimic of clinical diagnosis [4-71. By calculating, for each disease, the probability of the presenting symptoms in the presence of that disease, Bayes’ formulation permits the relative probabilities to be calculated for a number of possible diagnoses [8]. This is performed on the basis of the computer’s ‘experience’ (i.e. its database), which may be provided from textbooks, expert opinion, or perhaps optimally from actual patient records. Practically, the success of such programs in terms of diagnostic accuracy has varied considerably, often related to how ambitious the number of possible diagnoses. Several comparisons have been made with the computer performing as well or better Correspondence to: Dr. F.R. Vicary, Department of Gastroenterology, The Whittington Hospital, Highgate Hi, London N19 5NF, U.K. 002O-7101/91/$03.50 0 1991 Elsevier Scientific Publishers Ireland Ltd. Published and Printed in Ireland
48
E. W.F. W. Alton et al.
[2,5,7,9,10]. However, no decision-making program to aid the diagnosis of jaundice is in routine clinical use. In view of the possible benefit for clinical decision-making from such a program, we have designed a program ‘Solubile’ to aid the clinician in the diagnosis of the jaundiced patient. The principal aims were firstly ease of use with a friendly interface and all input and output in plain language; secondly to optimise diagnostic accuracy through mathematical rigour, thereby restricting subjective input to the database; thirdly to extend the computer’s usefulness to the clinician by providing more information than just a diagnosis. Since such a system must be easily transferable between hospitals serving very different populations, we have compared diagnostic accuracy at two different centres. Finally, the performance of ‘Solubile’ has been compared to that of varying grades of clinicians given identical information. Methods (i) Parameters For each patient entered into the database, up to 47 items of information (parameters) were obtained from the medical records. These included 17 questions relating to the patient’s history, 11 physical signs and 19 relevant special investigations (Table I). These were chosen using a combination of previously validated parameters [lo] and expert opinion [ 111. (ii) Diseases Each case was placed in one of 22 possible diagnostic categories (Table II). These were selected on the basis of disease prevalence, rarer conditions being grouped appropriately [12]. The final diagnosis was made either by histological means or using an accepted clinico-pathological sequence (e.g. a diagnosis of acute pancreatitis in a young previously fit heavy drinking male with acute onset of abdominal pain, amylase of 2000 I.U. and normal liver enzymes and biliary investigations). (iii) Database For ‘Solubile’, jaundice is defined as a bilirubin > 20 pmol/l. Accordingly, the case notes of 345 patients with a serum bilirubin > 20 pmol/l (normal range 3-13) were entered into the database. Cases were taken consecutively from patients seen at the Whittington Hospital (a district general hospital - Centre l), either in the accident and emergency or outpatient departments over a defined number of months and for whom a definite pathological diagnosis was able to be made. (iv) Hardware The program was run on an IBM compatible (Tandon) with 80286 processor, EGA graphics colour screen and 20 Mbyte fixed disk. (v) Use of ‘Soiubile Terminals were available on the wards and junior doctors encouraged to enter cases prospectively. On selecting ‘Solubile’ three introductory screens are available
‘Solubile’: Artificial intelligence
49
TABLE I PARAMETERS THAT CAN BE USED BY ‘SOLUBILE’ IN ITS DIFFERENTIAL DIAGNOSIS History Age of the patient (in years)? Sex of the patient (Male/Female)? Has the patient suffered from marked abdominal pain in this illness (Yes/No) Duration of jaundice (in weeks)? What is the duration of the patient’s itching (weeks) (answer 0 if no itching)? Weight loss (kg in last 3 months)? Has the patient suffered appetite loss in this illness (Yes/No)? Has the patient had pale stools during this illness (Yes/No)? Has the patient had dark urine during this illness (Yes/No)? Which category of drugs, if any, has the patient taken in the past 3 months (Type ‘list’ for the drug lists)‘! What was the patient’s alcohol usage in the last year (g per day/Guidance)? Has the patient been jaundiced in the past (Yes/No)? Has the patient come into contact with jaundice (Yes/No)? Previous history of biliary surgery (Yes/No)? Previous history of cancer (Yes/No)? Previous history of biliary colic (Yes/No)? Previous history of recent transfusions or intravenous drug abuse (Yes/No)? Examination What is the size of the patient’s liver palpable below the ribs (in cm)? Is the patient’s spleen palpable (Yes/No)? Does the patient have palmar erythema (Yes/No)? Does the spider have spider naevi (Yes/No)? Does the patient have Dupuytren’s contractures (Yes/No)? Does the patient have ascites (Yes/No)? What was the patient’s temperature (orally at admission, OC)? Does the patient have encephalopathy (Yes/No)? Does the patient have a peripheral neuropathy (Yes/No)? Does the patient have signs of cerebellar disease (Yes/No)? Does the patient have peripheral oedema (Yes/No)? Investigations Hb (g/dl)? MCV (fl)? White cell count? Platelet count? Reticulocytes (@Jo)? Prothrombin ratio? Is any urine bilirubin present (Yes/No)? Is any urine urobilirubin present (Yes/No)? Bilirubin @01/l)? AST (I.U./l)? YGPT (I.U./l)? Alkaline phosphatase (I.U./l)? Albumen (g/l)? Amylase (I.U./l)? Does an ultrasound show dilated ducts (Yes/No)? HBsAg + ve (Yes/No)? Smooth muscle antibodies + ve (Yes/No)? Antimitochondrial antibodies + ve (Yes/No)? Alpha-foeto-protein > 100 I.U.11 (Yes/No)?
50
E. W.F. W. Alton et al.
TABLE II
Diseases 1. 2. 3. 4. 5. 6. 1. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21.
Gall stones Alcoholic cirrhosis Chronic cardiac failure Acute alcoholic hepatitis Haemolytic anaemia Acute pancreatitis Acute hepatitis (other) Gilbert’s Disease Acute hepatitis A Pancreatic carcinoma Metastatic carcinoma Septicaemia Acute hepatitis B Chronic active hepatitis Idiopathic cirrhosis Chronic pancreatitis Primary biliary cirrhosis Cholangio carcinoma Cholestatic drug jaundice Granulomatous liver disease Heuatoma
Nos. in first 345 cases (55) (46) (39) (34) (231 (22) (14) (19 (13) (11) (16) (16) (16) (9) (8) (7) (6) (4) (9 (3) (3)
to explain the purpose and capabilities of the program as well as providing simple instructions. Ten successive screens prompt the clinician to enter whatever information is available regarding that patient. ‘Solubile’ then stores the data and provides the five most probable diagnoses and their respective probabilities on the screen. To help understand the rationale underlying the diagnosis, ‘Solubile’ can be asked to state which parameters were most influential in making one disease more probable than another. Comparison can be made between any pair of the five diagnoses specified, or between any one of these and a diagnosis the clinician perhaps considers more likely. With this information the clinician can make a judgement on the validity of the diagnosis of ‘Solubile’. Data can be updated as further investigations become available, and the diagnosis reassessed. Furthermore, if any parameters are unkown, ‘Solubile’ can suggest which three would be most likely to increase the certainty of its diagnosis and how much these would improve the probability. Finally, ‘Solubile’ monitors all information input in response to screen prompting and displays a ‘help screen’ appropriate to each question if an unreasonable answer is entered. (vi) Software Most diseases which cause jaundice can be characterised by the symptoms that they normally produce. In other words, the probability of a symptom being present depends on the disease.
‘Solubile’: Artificial intelligence
51
In principle, therefore, for any disease we can calculate the probability of the particular collection of symptoms exhibited by the patient. We might expect that the patient is most likely to be suffering from a disease that results in a high probability. This is made precise (given certain conditions) by Bayes’ theorem and ‘Solubile’ uses this theorem to produce its diagnosis. The patient’s history, physical signs and results of investigations are all treated in the same way by ‘Solubile’. To apply Bayes’ theorem it is necessary to estimate the probability of a parameter assuming a given disease. An auxiliary program (Auxibile) analyses stored data about patients to generate the statistical models that make this possible. These models are more sophisticated than those generally used [ 131. Because the prevalence of diseases varies between different geographical locations it is important to treat ‘location’ as a parameter. Each copy of ‘Solubile’ distributed ‘knows’ its location and it automatically tags this information onto every new patient. Both ‘Solubile’ and Auxibile are written in FORTRAN 77 so that they can be used on most computers. Their combined memory requirement is approximately 200 kbytes . (vii) Assessment of diagnostic accuracy The retrospective diagnostic accuracy (simple reclassification) was assessed using all 495 available patients (Table III). Thus a computer diagnosis was recorded for each case, without removal of that patient’s parameters from the database and compared to the true final diagnosis. For the prospective assessment 50 cases were chosen using the previously noted criteria and in an approximately similar disease distribution to that found in the database. The first three diagnoses suggested by ‘Solubile’ were compared to the recorded diagnosis in the patients’ notes. (viii) Performance at a second centre For comparison of the diagnostic accuracy of ‘Solubile’ in a teaching hospital within London (University College Hospital - Centre 2) case records were selected from a print-out of liver function investigations and 100 consecutive cases selected fulfilling the criteria for the original database. Parameters were recorded on questionnaires and entered as above. Analysis was used to compare the disease distributions of the two test groups (Table III). TABLE III
Study
No. of test cases
No. in database
Prospective Comparison of two centres Comparison with clinicians Learning curve Reclassification
50 50 (1st Centre) 100 (2nd Centre) 50 49 495
345 345 395 445 200-445 495
52
E. W.F. W. Alton et al.
(ix) Comparison with clinicians The 50 prospective test cases were used to compare the diagnostic accuracy of ‘Solubile’ with that of clinicians. The data (identical to that entered into ‘Solubile’) were recorded for the clinicians in an analogous fashion to that used in case records and normal ranges provided for all investigations. Sets of 50 cases were distributed at two gastroenterological meetings and clinicians asked to provide their three most likely diagnoses from the list of 22 possibilities. Replies were received from 7 Registrars, 5 Senior Registrars and 8 Consultants (or equivalent academic grades), including both physicians and surgeons and used in the further analysis. These 50 cases were then diagnosed by ‘Solubile’ using a database of 445 cases (Table III). (x) Assessment of ‘learning curve’ We assessed the size of the database needed by ‘Solubile’ for optimal diagnostic accuracy. Forty-nine of the 50 cases used in the comparison with clinicians were diagnosed by ‘Solubile’ on the basis of a database which was increased by increments of 50 cases from 200 to the maximum of 445 cases (Table III). One case was removed from the 50 test patients because otherwise insufficient cases of septicaemia would have remained in the first 200 patients for statistical analysis. Results (a) Table IV shows the results of the simple reclassification of all 495 available cases and the prospective diagnosis of the 50 cases at Centre 1. These are compared with the prospective diagnosis of test cases at the second centre. The disease distribution of the populations at the two centres was significantly different (P< 0.01). (b) Figure 1 shows the ‘learning curve’ obtained by increasing the number of cases in the database. No further significant improvement in diagnostic accuracy was seen after the first 350 cases had been entered into the database. Furthermore, the ‘confidence’ with which ‘Solubile’ correctly diagnosed a case did not improve with further addition to the database. The overall diagnostic ability of ‘Solubile’ based on the fifty test patients and the maximal database of 445 was 78% correctly diagnosed and 92% placed within the first three places. Results of the comparison of ‘Solubile’ with the clinicians are shown in Table V. ‘Solubile’ performed better, both in correct diagnosis and first three placing, with TABLE IV ‘SOLUBILE’: DIAGNOSTIC ACCURACY
Simple reclassification Prospective study 1st centre Prospective study 2nd centre
Correctly diagnosed in first place
Correctly diagnosed within first 3 choices
82%
96%
14%
94%
75%
89%
‘Solubile’: Artificial intelligence
53
100
95
90I-
85/80I% 75,-
7aI-
65i-
6CI5Ei-
(
I 0
200
250 Cases
300
350
400
445
in database
Fig. 1. The ‘learning curve’for ‘Solubile’ obtained by increasing the number of cases in the database used to diagnose 49 test cases. (0, % correctly diagnosed in first place; A. % correctly diagnosed in first three choices, ?? , mean probability assigned to correct diagnosis).
no clinician reaching the values achieved by the computer. Comparison of the diagnostic accuracy within the various grades of clinicians is shown in Table VI, senior registrars performing better than either conkltants or registrars, there being no difference between the latter two groups. TABLE V ‘SOLUBILE’VS. CLINICIANS Correctly diagnosed in first place (range)
Correctly diagnosed within first 3 choices (range)
Solubile
78.01
92.0%
Clinicians
49.5% (34.0-67.0)
68.5% (42.0-84.0)
54
E. W.F. W. A&on et al.
TABLE VI ACCURACY BY CLINICAL SENIORITY
Registrar (!I = 7) Senior Registrar (n = 5) Consultant (n = 8)
Correctly diagnosed in first place (range)
Correctly diagnosed within first 3 choices (range)
47.3% (34.0-52.0%)
67.9% (56.0--80.0%)
55.6% (50.0-62.0%)
73.6% (66.0-84.0%)
49.5% (37.0-6&O%)
67.3% (41 .O-8O.Oqo)
Discussion The use of decision-making to help clinicians in the diagnosis of patients with jaundice has been attempted on many previous occasions. That no such aid is in current clinical use probably relates to a number of factors, problems which we have tried to address in this study. With respect to the paramaters used to describe each case a compromise is required between the time needed to obtain and enter the data and provision of an adequate description of the patient. Previous studies have used up to 144 items of history, examination and investigations [4] but in view of the clinical aims of this study these have been limited to a total of 46 such parameters. These are likely to be obtained routinely by the clinician although we have not limited ourselves to immediately available investigations. ‘Solubile’ allows for provision of a diagnosis with whatever information is currently available for that patient and can be updated as more sophisticated investigations become available. Several previous studies have allowed for only a small number of possible final diagnoses [4-6,8] including differentiation solely between medical and surgical causes. In comparison, Begon et al. [7] used 54 possible categories, probably contributing to their somewhat lower accuracy of diagnosis. Since there is likely to be an inverse correlation between the number of possible diseases and diagnostic accuracy, we have chosen 22 differential diagnoses, to include those more commonly found in clinical practice. A rarer diagnosis, for example Rotor disease, often will be included within a broader category, in this case Gilbert’s disease. We were interested to note the ‘learning curve’ obtained by incremental addition of cases to the database. With the notable exception of the algorithmic diagnosis of jaundice used by the COMIK group [14] few of the previous studies have reached this apparent learning plateau when related to a Bayesian application. The optimal source of such a database has been keenly debated, the concensus favouring actual data obtained from patients’ records, rather than personal experience or textbook figures. All the information entered into the database was obtained from case records.
‘Solubile? Artificial intelligence
55
A potentially serious drawback to this method of obtaining data retrospectively is the absence of certain items of history, examination or investigations. This is discussed further below with respect to the information available to the clinicians diagnosing our cases. The source of the patients, for example, from an outpatients clinic or accident, and emergency department, is likely to have an important effect on biassing of the database. An acute presentation of a disease may differ significantly from a more chronic course perhaps seen in the outpatient clinic and certain diseases such as Gilbert’s rarely present in an accident and emergency department. Cases were therefore ‘tagged’ to indicate their source, thereby allowing for their discrimination within the database. Thus the figures for disease frequency applied to a new case were only those obtained from the same source as the patient. With further addition to the database, we will attempt to similarly separate acute and chronic presentations of a disease with respect to parameters such as temperature and white cell count. ‘Solubile’ was written to attempt a rigorous and more powerful application of Bayes’ theorem than has been used in many previous studies [ 15-171. It is novel in applying probability distribution functions to ‘continuous’ symptoms (e.g. patient age), rather than partitioning the symptom’s value into arbitrary ranges and associating a probability with each range. ‘Solubile’ is able to update automatically its database from input cases. This ‘learning’ process is clearly demonstrated in this study for a combination of different disease and despite the lack of further improvement after 350 cases had been entered, will of course be crucial for the rarer diagnoses with fewer representative cases. The facility to indicate why a particular diagnosis was considered most likely may help to identify parameters of particular importance in the presentation of a disease. Furthermore, if the available data for a patient are insufficient to provide a clear differential diagnosis, ‘Solubile’ can suggest the parameter whose value would be most likely to clarify the diagnosis. This may be expected to result in a lower number of investigations with consequent cost savings but whilst maintaining a high standard of diagnostic accuracy. The results of simple reclassification, the ‘easiest’ of the yardsticks for such a program, were in accordance with the generally high success rates previously recorded [4,5,8,19,20]. However, the prospective diagnosis of cases given the 22 possible alternatives appears to be an improvement on previous studies using Bayes’ theorem, Fraser and Franklin’s [8] 80% success rate being obtained from a much revised matrix applied to 14 posssible diagnoses. The final disease distribution of our database (Table II) reflects the accumulation of our cases from a non-specialist hospital, although the high incidence of haemolytic anaemia is likely to have been influenced by the local population served by the hospital. We were therefore interested to study the effect of moving ‘Solubile’ to a significantly altered population. Previous attempts at such transfer have shown a reduction in the diagnostic ability of the program, principally related to the alteration in local disease frequency [ 18-201. Table IV shows that this produced little effect on diagnostic accuracy, probably related to the removal of this variable from the calculations used by ‘Solubile’ until a ‘local disease distribution’ can be obtained from new cases.
E. W.F. W. Alton et al.
56
In conclusion, we have used a program based on Bayes’ theorem to diagnose patients presenting with jaundice to an outpatients or accident and emergency department. Its prospective use has been compared with results from previous studies and its ability to diagnose a significantly different population from that for which it was designed has been tested. Finally the diagnostic accuracy of the program has been compared to that of 20 clinicians of varying grades presented with identical information. Our program has fulfilled the phases of evaluation of clinical aids suggested by Spiegelhalter 1211. It is able to ‘diagnose’, is internally consistent, learns only from its own data and is able to monitor its own performance and adjust its results accordingly. Virtually no working models can do this at this time [22]. Finally, we would like to emphasise our aims in using decision-making to aid rather than usurp the clinician in aspects of diagnosis and management. ‘Solubile’ is at present in use in our department and in departments of gastroenterology in the USA, Australia, Holland, Canada, France, Italy, Algeria and Germany. Acknowledgements We would like to thank Mr. Ernest Alton and Miss Karen Turner for their help with the collection of data and Ms. Katy Andrews for typing the manuscript. References
5 6 7
8 9 10 11 12 13
14
Sterling TD, Nickson J and Pollack SV: Is medical diagnosis a general computer problem? J Am MedAssoc, 198 (1968) 281-286. Martin WB, Apostolakos PC and Roaaen H: Clinical versus actuarial prediction in the differential diagnosis of jaundice, Am JMed Sci, 240 (1960) 571-578. Fraser PM and Baron DN: Computer-assisted classification and diagnosis of liver disease. Proc R SocMed, 42 (1966) 776-779. Burbank F: A computer diagnostic system for the diagnosis of prolonged undifferentiating liver disease, Am JMed, 46 (1969)401-415. Cattaneo AD, Lucchelli PE, Rocca E, Mattioli F and Becchi G: Computer versus clinical diagnosis of biliary tract diseases, JAbdom Surg, 31 (1972) 71-75. Knill-Jones RP,,Stern RB, Girmes DH, Maxwell JD. Thompson RPH and Williams R: Use of sequential Bayesian model in diagnosis of jaundice by computer, Br Med J, (1973) 530-533. Begon F, Lockhart AM, Metreau JM and Dhumeaux D: A computer-aided system for the diagnosis of heapto-biliary diseases. A comparison with the performance of physicians, Med Idorm, 4 (1979) 3542. Fraser PM and Franklin DA: Mathematical models for the diagnosis of liver disease, Quart JMed, 43 (1974) 73-88. Stern RB. Knill-Jones RP and Williams R: Clinician versus computer in the choice of 11 differential diagnoses of jaundice based on formal&d data, Meth Inform Med. 13 (1974) 79-82. Boom R, Gonzalez C. Fridman L, Ayala JF, Realpe JL, Morales P and Quintero R: Looking for ‘indicants’ in the differential diagnosis of jaundice, MedDecisMaking, 6 (1986) 36-41. Sherlock S: Diseases of the Liver and Biliaty System, 7th edn.. Blackwell. London, 1986. Schiff RP: Diseases of the Liver, 3rd edn., Collins. London, 1987. Newman N. Alton EWFW,‘Vicary FR and Hooper J: Microcomputer-aided diagnosis of jaundice (‘Solubile’). In Computers Zn Gastroenterologv (Ed: FR Vicary), Springer-Verlag, London, 1988, pp. 175-184. Malchow-Mtiller A and Thomson C: Copenhagen lcterus Group. Algorithmic diagnosis of jaundice, Stand JGastroenterol, S128 (1987) 162-168.
‘Solubile’: Artificial intelligence 15 16 17 18 19 20
21 22
57
Clifford PC, Chan M and Hewett DJ: The acute abdomen: management with microcomputer aid, Ann R CollSurgEngl. 68 (1986) 182-184. Salomon R, Bernadet M and Samson M et al.: Bayesian method applied to decision making in neurology, Meth Inform Med. 15 (1976) 174-179. Warner HR, Toronto AF, Veasey LG and Stephenson R: A mathematical approach to medical diagnosis, JAm MedAssoc, 177 (1961) 177-183. Stem RB, Knill-Jones RP and Williams R: BrMed J, 2 (1975) 659-662. Lindberg G, Bjorkman A and Knill-Jones R: Computer aided diagnosis of jaundice. A comparison of two data bases, Stand JGastroenterol, S128 (1987) 180-189. Thomsen C and Malchow-Miiller A: Copenhagen Computer Icterus Group (COMIK). Transferability of a probabilistic algorithm in differential diagnosis of jaundice, Stand J Gastroenterol, S128 (1987) 170-172. Spiegelhalter DJ: Evaluation of clinical decision aids, Stat Med, 2 (i983) 207-216. Knill-Jones RP: Computers in gastro-enterology: An overview of diagnostic applications. In Computers in Gastroenterology (Ed: FR Vicary), Springer-Verlag, London, 1988, pp. 149-160.