Bone Browser a decision-aid for the radiological diagnosis of bone tumors

Bone Browser a decision-aid for the radiological diagnosis of bone tumors

Computer Methods and Programs in Biomedicine 67 (2002) 137– 154 www.elsevier.com/locate/cmpb Bone Browser a decision-aid for the radiological diagnos...

875KB Sizes 1 Downloads 40 Views

Computer Methods and Programs in Biomedicine 67 (2002) 137– 154 www.elsevier.com/locate/cmpb

Bone Browser a decision-aid for the radiological diagnosis of bone tumors I. Lejbkowicz a, Fred Wiener a,*, A. Nachtigal b, D. Militiannu b, U. Kleinhaus a,b, Y.H. Applbaum c a

Faculty of Medicine, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel b Department of Diagnostic Radiology, Rambam Medical Center, Haifa, Israel c Radiology Department, Hadassah Uni6ersity Hospital-Ein Karem, Jerusalem, Israel

Received 13 June 2000; received in revised form 24 November 2000; accepted 28 November 2000

Abstract Bone Browser, a decision-aid for radiological diagnosis of bone tumors, was developed in cooperation with the Radiology Department of the Rambam Medical Center, Haifa. The system offers case specific advice from an expert system (ES), general information on the lesion area and allows for recording and retrieving cases. The ES utilizes both rule-based and probabilistic inferencing methodologies to arrive at a differential diagnosis (DD). The knowledge base was validated on 105 cases with known outcome. Clinical evaluation consisted of 59 new cases whose final diagnosis was not known to the evaluators. The correct diagnosis was included in the system’s DD in 85% of the cases, which is comparable to the diagnostic accuracy of senior radiologists (88%). The system proved to be helpful to the expert, diagnosing cases missed by the radiologists and suggesting additional diagnoses not listed by the radiologists, raising their diagnostic capability to 91%. © 2002 Elsevier Science Ireland Ltd. All rights reserved. Keywords: Expert systems; Probabilistic and logical inferencing; Bayes theorem; Decision-aids; Bone tumors; Radiological diagnosis

1. Introduction Bone tumors can be defined as those conditions of the skeletal system that are neoplastic or could be mistaken for a neoplastic condition on basis of radiological and/or pathologic evidence [1]. Many of these lesions are rare and therefore most radiologists do not accumulate enough personal experience to make an accurate diagnosis [2]. * Corresponding author. Present address: Faculty of Computer Science, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel.

The radiological diagnosis of bone tumors is extremely important [3]. First of all, the radiologist can rule out malignancy in many cases, avoiding unnecessary biopsies. Secondly, most bone tumors cannot be diagnosed by histology alone. Because of the high variety of cells present within bones, histological studies of a lesion can give misleading results if the radiological findings are not kept in mind. The radiological work-up of the lesion is also necessary to establish the most appropriate technique for a biopsy (when one is needed) as well as the most suitable place within the lesion to perform it.

0169-2607/02/$ - see front matter © 2002 Elsevier Science Ireland Ltd. All rights reserved. PII: S 0 1 6 9 - 2 6 0 7 ( 0 1 ) 0 0 1 1 5 - 8

138

I. Lejbkowicz et al. / Computer Methods and Programs in Biomedicine 67 (2002) 137–154

The rarity of these lesions and the importance of their diagnosis justify the development of computerized tools to help in the diagnosis. Over the last decade many radiological computer-aids were developed, most of them for diagnosis. These systems provide a list of features to be looked for and evaluated in the radiograph, facilitating a systematic data entry. This serves to achieve superior diagnostic quality by diminishing the risk of overlooking less impressive and/or unexpected items [4]. The systems provide the means to interpret the combination of signs observed in a particular case. This interpretation can be based on one or more of the ES inferencing methodologies, rule based; probabilistic; or neural networks. NEONATE, for example, is an ES embedded within the HELP Hospital Information System [5]. NEONATE helps in the interpretation of chest X-rays of neonates with tachypnea admitted to the intensive care unit. The system asks the physician for the presence or absence of findings in the radiographs and makes interpretations based on the findings. Another example is MacRad [6], a radiology retrieval system that uses a case based paradigm to identify intracranial masses. It is implemented as a part of a relational database with an image archive, retrieving images similar to the specific case under study. Other systems use ES techniques to differentiate between malignant and benign lesions in mammograms, based on the features extracted by the radiologist [7]. Another type of diagnostic system uses backward chaining to critique the radiologist’s interpretation of the signs. In this type of system, the radiologist suggests diagnoses and the system points out their likelihood, and may suggest other possible diagnoses. ICON, for example, is a developmental ES designed to critique the process of radiologic differential diagnosis (DD) of lung lesion [8,9]. The physician describes the relevant findings he observes in a chest radiograph, some clinical information about the patient, and proposes a possible diagnosis. ICON then critiques the appropriateness of that diagnosis. Other systems were developed based on ICON’s approach: Image/ICON and Mammo/ICON critique the radiologic image interpretation; they incorporate li-

braries of images as illustrative examples for diagnosis or education [10,11]. Previous work on bone lesions include three neural networks. In the first [12], training of the feed-forward networks with a back-propagation algorithm was performed using either 46 hypothetical examples or 115 real cases. The data base for testing the different ANN included 115 clinical cases (five cases for each of 23 possible diagnosis), which were different from the examples used for training. Results were compared with an experienced radiologist. The radiologist was significantly superior in finding the correct diagnosis. The second [13] is a network constructed from 110 textbook cases and tested on 44 cases from a teaching file, with but limited success. The third [14] used a learning process from 709 radiographs and tested the network on four subsets of the training sample. The results, while promising, were not yet considered to be sensitive enough for use as a screening device. Another approach is to use Bayes’ formula. One of the earliest diagnostic systems which considered only nine types of bone tumors was developed by Lodwick [15] reaching 95% accuracy in a test series of 76 cases. This result was considered by Lodwick as an upper limit of accuracy for the test series which may not be repeated in clinical work. Gell [2] extended Lodwick’s system to 34 lesions, producing DD lists for 144 cases. However, the high accuracy (89.7%) that he obtained was by adjusting the case data to correspond to the radiologist’s intuitive diagnosis. No clinical evaluation was reported by Gell. Herein, we present Bone Browser, a decisionaid with which the radiologist can interact throughout the diagnostic process. For the core problem of diagnosis, we use an expert system (ES) to formulate and make available to the radiologist the criteria and decision logic for the diagnostic interpretation of the radiographs. The case data required by the ES is organized into protocols for the systematic review of the X-ray picture and extraction of pertinent features. Since ESs that offer physicians other tools besides the expert advice, are generally better accepted by the medical staff [16], we have used state of the art programming tools to create an acceptable user

I. Lejbkowicz et al. / Computer Methods and Programs in Biomedicine 67 (2002) 137–154

interface and to provide facilities such as case data storage and review, explanatory notes for the items in the data entry protocol and background data on the specific medical area. The radiological diagnosis of bone tumors often does not lead to a single diagnosis. Radiologists thus report their result as a DD list. We, therefore, designed the system to produce such a list using a combined logical and probabilistic inference methodology. The system can provide explanations of these conclusions in accordance with the various ways the physician interprets the data. The radiologist can thus use his own judgment based on a wide range of information that is readily presented to him by the system. The system was validated and clinically evaluated and the results are reported in this paper.

2. Methods

2.1. System de6elopment 2.1.1. The expert system The knowledge base (KB) was constructed to provide for three inferencing methodologies, one using rule-based logic and two others using different variations of Bayes’ theorem. The rules were formulated using the expert-system shell Simulating Medical Reasoning (SMR) [17]. SMR was designed to facilitate specification of the knowledge in any medical area by the domain experts themselves, without the need of computer expertise. SMR provides programs for compiling the medical knowledge and applying it to case data as well as programs for constructing an information system (IS) for storage and retrieval of case data and for conducting queries [18]. Forty-nine lesions comprehending the most common and clinically important bone tumors and tumor mimicking lesions were described in terms of their associated findings [1,19,20]. From these specifications, the KB rules were formulated. Each lesion was described as a single rule. An SMR rule consists of an inference statement followed by its associated criteria. Each criterion is assigned points (see Table 1 for example) for confirmation and/or rejection of the conclusion

139

such that combinations of criteria sufficient for confirming or rejecting the inference will total 50 points or more. This threshold logic allows expression of multiple Boolean combinations each of which totals up to 50 points. The first of the Bayesian methods calculates the probability that any of the 49 lesions is consistent with the case data [21], therefore we assume that all lesions are initially equally likely. The knowledge base for Bayesian inferencing is an ancillary IS organized by lesions. It gives the estimated lesion prevalence and the incidence of 128 findings (that appeared as criteria in the rules) in each lesion. Most of these incidences were estimated from qualitative descriptions in the literature. Age incidence and bone involvements were collected from statistical data [1,19]. The incidences are shown in Table 1, under the column headed TPR. The outcome of Bayes’ calculation, is a list of lesions with probability of 5% or more, up to a maximum of five lesions. Since, radiologically speaking, it is often the case that more than one lesion may be compatible with the patient data, an alternate method of using Bayes’ theorem is to calculate the probability that a lesion is present or absent [22]. The incidence of a finding in a given lesion is taken as the true positive rate (TPR), while the incidence of this finding in all the other lesions, taken as a single group, is the false positive rate (FPR). It is assumed that a patient presented to the system has at least one lesion. The probability of the presence of the lesion is then calculated from the product of the likelihood, TPR/FPR, of all the findings in the case data. The probability of the presence of each lesion is thus calculated independently of all others and is an indication of how well the patient data fits the lesion description. The overlap in the lesion description may result in a number of lesions with a high probability of being present. All lesions with a probability of 70% or more are accepted as valid inferences by the alternate Bayesian method. The two methods of applying Bayes’ formula were considered adequate for the task of reaching diagnostic conclusions, since case data were essentially descriptors of radiological lesions and were thus independent of each other. There was no need therefore to

140

I. Lejbkowicz et al. / Computer Methods and Programs in Biomedicine 67 (2002) 137–154

if no lesion is compatible with the patient data, at least one will have a relatively high probability, since the method calculates the lesion probability relative to all the lesions in the KB. The alternate Bayesian method gives the appropriate emphasis on all the different diagnoses consistent with the case data. An example of a lesion description in the logical and probabilistic KBs is shown in Table 1. The DD was constructed as follows, to the list of logical conclusions (confirmed rules) we add the Bayesian inferences that did not appear in the logical list. We then add all the alternate Bayes’ conclusions that were not reached by the other

consider causal probabilistic (Bayesian) networks to calculate the diagnostic probabilities. Each of the three methodologies has its own justification. The rule based approach permits a more positive description of the lesion giving each finding its proper functional weight. It also takes into account contradictory data that can lead to the rejection of the conclusion independently of all other data. In Bayes, a factor with low likelihood is insufficient within the context of the probability calculation to rule out an inference. The Bayesian method allows the ranking of the conclusions on a probabilistic basis, which is more satisfying to the physicians. However, even

Table 1 A SMR rule for a lesion and the corresponding incidence of each findinga SBC

SMR points Confirmation

Age, 0–5 Age, 5–10 Age, 0–30 Age, 10–20 Age, 20–30 Soft tissue involvement, none Pattern, geographic Size, \5 cm Bone, proximal humerus Bone, proximal femur Sagittal location, metaphysis Transverse location, medullary canal — central Zone of transition, narrow Margin, sclerotic Tumor matrix, lytic bone matrix Host response, none Monostotic Fallen fragment sign with periosteal reaction Polyostotic Sagittal location, metaepiphysis Size, 0–1 cm Pattern, moth-eaten or permeative a

TPR (%)

FPR (%)

5.0 20.0 57.0 8.0

4.5 6.0 21.7 17.5

98.0 98.0 53.0 48.0 24.0 70.0 90.0 97.0 95.0 89.0 88.0 98.0 30.0 1.0 1.0 1.0 1.0

50.0 35.0 53.0 7.0 11.5 39.5 45.4 57.0 47.0 60.0 64.0 76.0 0.1 20.6 9.0 5.5 32.0

Rejection

7

7 6 7 7 8 5 7 7 5 5 5 5 50 50 50 50 50

An SMR rule consists of an inference statement followed by its associated criteria. Each criterion is assigned points for confirmation and/or rejection such that combinations of criteria sufficient for confirming or rejecting the inference will total 50 points or more. Bayesian inferencing takes into account the incidence of each finding in the given lesion (the TPR), and calculates the lesion probability according to Bayes’ formula. Most of these incidences are translations from qualitative descriptions in the literature to quantitative estimates. The alternate Bayesian inferencing takes into account both the TPR and the FPR. The FPR is calculated by the system as the incidence of the finding in all other lesions taken together. The probability of the presence of the lesion is then calculated from the product of the likelihood, TPR/FPR, of all the findings in the case data. Shown here is a rule in the Bone tumors application and the TPR and FPR of the findings in the rule. In the Bayes’ KB, all findings present in all lesions have a TPR and FPR, and not only the findings appearing in the SMR rule for the lesion.

I. Lejbkowicz et al. / Computer Methods and Programs in Biomedicine 67 (2002) 137–154

two methods, up to a maximum list of ten items. The list is ordered according to descending Bayesian probability. Preference is given to the logical conclusions by including all of them in the DD, since in our experience probabilistic inferences tend to proliferate, with not all of them being relevant, whereas the logical conclusions derive from a match of the patient data with the lesion description.

2.1.2. Protocol for entering case data The SMR system produces an alphabetized list of all the criteria statements appearing in the KB, citing the rules in which each criterion appears. These criteria were then organized into a protocol for patient data collection. Mutually exclusive criteria and statements referring to a single number such as age were organized into criteria groups, appearing as a single question in the protocol, with either a numerical or a multiplechoice answer. The other criteria, whose presence, ‘y’, must be indicated, are displayed under appropriate headings. The criteria are radiological features that can be observed in plain radiographs, as well as the patient’s age, relevant symptoms and medical history. The radiological features include, the location of the lesion in the skeleton; the transverse and sagittal location of the lesion within the bone; the size of the lesion; the pattern of the lesion; the margin of the lesion; the zone of transition between the lesion and normal tissue; the host response to the lesion; the tumor matrix; the soft tissue involvement; whether it is a single or a multi-focal lesion. In addition to these findings general to all lesions, there are also features particular to specific lesions. A study was conducted to measure inter- and intra-observer variations in case data entry. After the first phase of the study, the protocol questions and multiple-choice answers were updated and some explanatory notes in the Help files were reformulated, resulting in an appreciable reduc-

141

tion in observer variations and in increased correlation between variation in case data and the variation in the corresponding diagnoses arrived at by the system. The details of this study will be presented in a separate report.

2.1.3. Information system (IS) In order to store and retrieve case data, an IS was constructed consisting of data screens based on the protocol for patient data entry. The Magic Application Builder (Mashov, Tel-Aviv, Israel) was used to develop the IS. The system can be used in any PC, with Windows 95 and later and an Internet Browser. Entry into the IS is through the main Menu (Fig. 1). One of the data screens is shown in Fig. 2. The data fields are associated with the corresponding statements in the KB and the contents of the fields for each case are passed on to the ES for inference making. 2.1.4. Help files For each criterion, an explanatory description written in hypertext was linked to the protocol (Fig. 3). When recording a case, the user can select any criterion to display the explanatory note for the criterion. Following the explanatory text, the system provides for displaying a radiographic image of the criterion. The explanatory note then lists all the lesions in which the criterion appears as obtained from the alphabetized list produced by the SMR. Hypertext was used to link the listed lesions to their descriptions in the help files. This description lists all the findings associated with the lesion, as given in the corresponding SMR rule, together with the distribution of the incidence of the lesion with respect to age and bone involvement. An example of a lesion description is also shown in Fig. 3. Alternatively, from the system menu (Fig. 1), the user can display the list of the lesions in the KB, and select a lesion to display the lesion description. From the lesion description, the user can select any criterion and display the explanatory note. 2.1.5. The system report The system produces a report for each case analyzed. The report includes a summary of the patient data and a single DD based on the com-

142

I. Lejbkowicz et al. / Computer Methods and Programs in Biomedicine 67 (2002) 137–154

Fig. 1. Bone browser menu. The user can choose among the following options, enter new patient data; retrieve data on a patient already in the IS; search for information on the lesions; show the protocol for case data to review the explanatory notes on the protocol questions; display the distribution over the lesions of bone and age involvement; and get to the query menu to preset queries on the recorded patients.

bined rule-based and the probabilistic inferencing methods. The user’s agreement or disagreement with the conclusions is recorded in the IS for evaluating the diagnostic capability of the system (Fig. 4). The report can be printed out (Fig. 5). For explanations of the system’s conclusions, the logical and the probabilistic components of the DD are displayed separately. The user may choose the type of explanation, logical or probabilistic, which he prefers. For the logical conclusions (Fig. 6), the diagnoses are listed on the left-hand panel. The user can select any of the diagnoses and the right-hand panel will show the patient findings that contributed to this diagnosis. For the probabilistic inferences (Fig. 7), the lesions are ranked according to their Bayesian probability, which is displayed together with the

alternate Bayes’ probability calculated from the likelihoods. Any diagnosis may be selected and the corresponding explanation will appear in the right-hand panel. The explanation consists of all the patient findings for this case, ordered in descending sequence of likelihood. The user can thus see which findings supported the conclusion and which findings contradicted it. The cumulative probability, as calculated stepwise from the likelihoods by the alternate Bayes’ method are displayed in the last column. These explanations can be printed out in the form of a full report that in addition to the patient findings and the system’s DD shown in the regular report also contains the SMR and the probabilistic explanations for all the inferences in the DD. An excerpt of the printed report showing the explanation for one of the lesions is shown in Fig. 8.

I. Lejbkowicz et al. / Computer Methods and Programs in Biomedicine 67 (2002) 137–154

2.1.6. Query facility The query facility allows retrieval from the lesions IS of the most common lesions in any age group. A query may also be made combining any skeletal location with a specified age group with the answer including the age group frequency of the lesion for all locations and the skeletal location frequency for all ages. These queries can be invoked from the bone browser menu, Fig. 1, or during the entry of case data. Such a query and its results are shown in Fig. 9. The query facility may also be used on the patient IS to obtain lists of cases with a specified diagnosis, or list of cases with a specific set of findings, for research and for aiding in the diagnosis of difficult cases.

143

2.2. Validation and e6aluation of the system 2.2.1. Validation of the KB Since the KB was formulated according to the lesion descriptions in the literature and not according to case descriptions, it is important to test the system’s diagnostic capability by constructing typical cases and by applying the system to real cases recorded by the radiologists. For real cases, the correct diagnosis is associated with a goldstandard, which is generally the biopsy. For some benign lesions that can be confidently diagnosed from the radiographs and do not undergo biopsy, the final radiological diagnosis, given by a radiology specialist in the field is the gold-standard.

Fig. 2. A screen of the bone browser IS. In order to store and retrieve case data, an IS was constructed consisting of data screens based on the protocol for patient data entry. The Magic Application Builder (Mashov, Tel-Aviv, Israel) was used and one of the data screens is shown here. Most of the questions are multiple-choice questions, with the possible answers presented in combo boxes. Clicking on the ? button near each question, the user can display the explanatory note for the question from the help files.

144

I. Lejbkowicz et al. / Computer Methods and Programs in Biomedicine 67 (2002) 137–154

Fig. 3. Help files. For each criterion, an explanatory description was linked to the protocol. When recording a case, the user can select any criterion and display the explanatory note for the criterion. Each explanatory note also lists all the lesions in which the criterion appears. Hypertext was used to link the listed lesions to their lesion description in the help files. For each lesion, the lesion description consists of all criteria used in the SMR rule for confirming the presence of the lesion, together with the distribution of the incidence of the lesion with respect to age and bone involvement. Alternatively, the user can display the list of lesions in the KB directly from the menu (Fig. 1), select a lesion and display the lesion description. From the lesion description the user can select any of its defining features and branch to the explanatory note.

Where there are discrepancies between the system’s diagnosis and the gold-standard, the KB is changed accordingly. The rule-based KB can be changed by adding or deleting criteria to the rules or by adjusting the points assigned to each criterion. The probabilistic KB is changed by adjusting the probabilities given to each finding in each lesion. The KB validation consisted of two stages. 1. A typical case of each lesion constructed from

the description of the lesion in the literature [20]. The objective in analyzing such cases is to assure the completeness and specificity of the represented knowledge. We construct a typical case for each lesion in the KB and test the system’s capability to reach this diagnosis and no other. The KB is then updated until this result is achieved for all the inferences in the KB. Once the KB has been shown to truly represent the knowledge domain, we may ex-

I. Lejbkowicz et al. / Computer Methods and Programs in Biomedicine 67 (2002) 137–154

pect a specific and accurate diagnosis for real cases. 2. One hundred-five real cases from the Teaching Files or from routine work entered into the system by the radiologists themselves. The objectives in analyzing such cases are:  Test the system’s capacity of reaching the correct diagnosis and providing a meaningful DD.  Introduce the necessary changes and additions in the KB (fine tuning), to improve the system’s diagnostic capability. After every batch of 30– 40 cases was analyzed by the system, corrections were made in the KB (as described above) so that the system could arrive at the correct diagnosis in as many cases as possible.

145

2.2.2. Clinical e6aluation During the system validation, the KB was continuously updated to achieve the best fit between the system’s conclusions and the correct (gold-standard) diagnosis. We now apply the system to a fresh set of cases, leaving the KB unaltered, to determine the system’s diagnostic capability. Fifty-nine cases of patients seen in the Rambam Medical Center in Haifa and at Hadassah Hospital in Jerusalem with the final diagnosis established by biopsy, were randomly selected and entered by four senior radiologists with experience in bone tumors. The radiologists did not know the biopsy results when entering the case data, which corresponds to the situation during routine system use. The radiologist also recorded his/her DD for

Fig. 4. The bone browser report. The bone browser report for each case analyzed includes a summary of the patient findings, the system’s DD, the logical inferences and explanations and the probabilistic inferences and explanations. The user can click on the tabs to branch to the different parts of the report. The system’s DD shown here is a list of the diagnoses produced by the rule-based method and the two probabilistic inferencing methods. To the list of confirmed rules, the probabilistic inferences are added up to a maximum list of ten inferences. This list is then sorted in descending order of probability. The user agreement or disagreement with the conclusions is recorded.

146

I. Lejbkowicz et al. / Computer Methods and Programs in Biomedicine 67 (2002) 137–154

Fig. 5. The printed patient report. For each case analyzed, a report can be printed. The report includes a summary of the patient findings and the system’s DD for the case, the diagnoses produced by the rule-based method and the two probabilistic inferencing methods, listed in descending order of probability.

the entered case. The system’s DD was then obtained. The radiologists entered their agreement or disagreement with each diagnosis in the system’s DD, adjusting their own DD accordingly. From this data, we could determine the accuracy of the system’s DD and assess the contribution of the system to the radiologist’s diagnostic accuracy. 3. Results

3.1. Validation of the KB The rule based KB achieved the correct diagnosis in all typical cases. In a few cases, more than one diagnosis was reached. A redistribution of the points assigned to the criteria reduced the diagnosis to a single one in each case without disturbing the ability of diagnosing correctly all the other typical cases. It was not necessary to add or delete criteria from the lesion description rules. For the Bayesian KB, some of the estimated probabilities were adjusted to assure that the correct diagnosis had the highest probability.

The list of 105 real cases used for the second stage of the KB validation and the number of these cases correctly diagnosed by the system are shown in Table 2. The SMR method reached the correct diagnosis in 93 of the 105 cases (89%). The Bayes’ method reached the correct diagnosis in 91 cases (87%) and listed the correct diagnosis in first place in 72 cases (68%). The combined DD reached the correct diagnosis in 96 cases (91%). The SMR method listed a total of 252 diagnoses in the DDs for all 105 cases, while the probabilistic method listed 410. On average, SMR listed 2.4 diagnoses per case and the probabilistic method 3.9 diagnoses per case. The combined DD included 473 diagnoses for all 105 cases (4.5 diagnoses per case, on the average), the probabilistic method thereby added 221 diagnoses to the final DDs. For some lesions, like Chondrosarcoma, Enchondroma, Fibrous Dysplasia and Solitary Bone Cyst (SBC), both methodologies correctly diagnosed all the cases. For others, like Aneurysmal Bone Cyst and Osteosarcoma, the majority of cases were correctly diagnosed. For all lesions where more than two cases were analyzed, at least the majority of cases were correctly diagnosed by the system.

3.2. Clinical e6aluation The list of 59 cases used for the clinical evaluation together with the number of cases correctly diagnosed by the radiologists and by the system are shown in Table 3. Prior to the system consultation, the radiologists included the correct diagnosis in their DD in 52 cases (88%). There were no significant differences among the performance of the four radiologists. For two cases no diagnoses were given. The system reached the correct diagnosis in 50 cases (85%). Forty-eight of these 50 cases were also correctly diagnosed by the radiologists. After considering the system’s conclusions, the radiologists accepted some of them (see Table 4a) into their DD. The adjusted radiologist’s DD included the two correct diagnoses of the system, which were not among the original 52 correct diagnoses of the radiologists. With the aid of the system, therefore, the radiologist’s correct diagnoses increased to 91.5% (54 cases).

I. Lejbkowicz et al. / Computer Methods and Programs in Biomedicine 67 (2002) 137–154

Table 4a shows the agreement between the radiologists and the system’s DD. In 40 cases, the entire radiologist’s DD list was included in the system’s DD. In 15 cases, there was at least one diagnosis in common in the two DDs. In four cases there was no diagnosis in common, in two of them the radiologists did not give any diagnoses. In 27 cases, the system suggested at least one new diagnosis which the radiologists did not include in their original DD, but which they accepted for inclusion in their adjusted DD. In Table 4b, we can see that the radiologists gave a total of 120 diagnoses for the 59 cases (an average of two diagnoses per case) and the system gave a total of 273 diagnoses for the 59 cases (an average of 4.6 diagnoses per case). Ninety-one of these 273 diagnoses were also listed in the radiologists initial DD. When examining the system’s

147

DD for each case, the radiologists accepted 42 additional diagnoses. The radiologists thus agreed with 133 of the system’s diagnoses, they disagreed with 104, and were undecided on 34 of the system’s diagnoses.

4. Discussion The results of the KB validation indicate that although the system can diagnose about 90% of the lesions seen, the KBs could not express all the possible variations in which some lesions may manifest themselves. Many of the lesions are very rare and their description in the literature is based on only a few cases and therefore are less complete than the descriptions of more common lesions. In addition, it is known that the same lesion

Fig. 6. The logical conclusions and explanations. The conclusions are listed in the left panel, with the list header preselected. The user can select any item on the list. The right panel shows the explanation for the selected conclusion. The patient findings that contributed to the conclusion are displayed. If there are more than five such findings, the scroll bar is used to display them.

148

I. Lejbkowicz et al. / Computer Methods and Programs in Biomedicine 67 (2002) 137–154

Fig. 7. The probabilistic conclusions and explanations. The probabilistic conclusions shown here are coupled to explanations for each reported conclusion. The inference list is in the left hand panel and the explanation for the selected lesion appears in the right hand panel. All the patient findings are displayed ordered in descending sequence of their likelihood for the selected lesion. The scroll bar is used to review the entire set of findings. The cumulative probability is calculated from the likelihoods using the alternate Bayes’ method.

may have different appearances in the diverse sites of the skeleton, making it difficult to include in the KB all the possible descriptions of every lesion. It is also known that some relatively common bone lesions may, occasionally, present themselves in unexpected ways. Although these are rare cases of each lesion, together they may represent a considerable portion of cases seen in a specialty unit. There are limitations to the adjustment of the KB for improving diagnostic accuracy. For the logical KB, adjustments made to allow correct diagnosis in a specific case must be checked to avoid spurious diagnoses in other cases. In the probabilistic KB, the estimated incidences could be changed within a limited range only, to remain compatible with the qualitative descriptions in the literature.

Analyzing the cases in which we could not readjust the KB in order to achieve the correct diagnosis, we can see that the nine cases undiagnosed by both methods are extremely rare cases. Two of these cases are rare lesions and seven are rare manifestations of more common lesions. The radiologists also failed to diagnose the rare lesions, and of the seven other cases they reached the correct diagnosis in only two. The number of diagnoses per case for the logical method is similar to that generally given by the radiologist’s in their DD. The radiologists use their expertise to reject or confirm a diagnosis on the basis of the presence or absence of a small number of salient findings, which is also expressed in the rule logic. Such findings have only a limited effect on the calculation of the probabilities which

I. Lejbkowicz et al. / Computer Methods and Programs in Biomedicine 67 (2002) 137–154

149

Fig. 8. Print-out of the full patient report. The user can print a more extensive report, which includes all the parts of the bone browser report viewed interactively on screen. Besides the summary of the patient findings and the system’s DD, the logical and probabilistic conclusions are listed separately together with the explanations for each conclusion. Shown here are the logical and the probabilistic explanations for one of the conclusions. The logical explanation consists of those findings that contributed to the conclusion. For the probabilistic explanations, all the findings for this case are displayed, ordered in descending sequence of their likelihood for the selected lesion. The cumulative probability is calculated from the likelihoods according to the alternate Baye’s method. The user can see which findings contribute more to the diagnosis and which findings contribute less or are contradictory to the diagnosis.

150

I. Lejbkowicz et al. / Computer Methods and Programs in Biomedicine 67 (2002) 137–154

are based on the total set of findings, therefore resulting in a larger number of diagnoses in the probabilistic DD’s. As we have seen during KB validation, there are cases in which the correct diagnosis was not given by the logical method, but was added to the final DD by the probabilistic approach, thus justifying using the combination of the two methodologies to construct the system’s final DD. This final combined DD, which takes advantage of the specific capabilities of both logical and probabilistic methodologies, correctly diagnosed over 90% of the cases, failing only in extremely rare cases. Since the purpose of ESs is to make the expert’s knowledge available to non-expert colleagues in the field, the diagnostic capability should approxi-

mate that of the expert. During the clinical evaluation of the system, we found that the present system’s performance (85%) is comparable to that of the experts who participated in the system’s development (88%). The system may assist the experts themselves, by suggesting diagnoses, which they may have overlooked and by providing a more systematic method of recording patient data. The fact that with the assistance of the system, the experts diagnostic accuracy increased to 91.5%, shows that the present system can aid the expert as well as the non-expert radiologist. The system suggested more than double the diagnoses per case than the radiologists. As was seen during the KB validation study, the expansion of the system’s DD is due mainly to conclu-

Fig. 9. A query and its results. The bone browser query facility allows retrieval from the lesions IS of the most common lesions in any age-group. Queries like the one shown here may also be made, combining any skeleton location with a specified age group with the answer including the frequency of the lesion in the age group for all locations and the bone frequency for all ages. Thus, for example, 48% of all cases of SBC are in the proximal humerus, while 57% of SBC patients are in the 10 – 20 years age group, irrespective of in which bone the lesion appears. These queries may be invoked from the bone browser menu (Fig. 1) or during the entry of case data from the bone location data screen.

I. Lejbkowicz et al. / Computer Methods and Programs in Biomedicine 67 (2002) 137–154

151

Table 2 KB Validation cases and resultsa Lesions

Aneurysmal bone cyst Brodie’s abscess Chondroblastoma Chondromy fibroma Chondrosarcoma Chondrosarcoma, peripheral Enchondroma Enchondroma, periosteal Enchondromatosios Ewing’s sarcoma Fibrous dysplasia Fibrosarcoma Giant cell tumor Hemangioendothelioma Hemangioma Hemangiopericytoma Histiocytosis X Hodgkin’s disease Lymphoma of bone Metastases Malignant fibrous histiocytoma Multiple myeloma Non-ossifying fibroma Osteosarcoma Ossifying fibroma Osteoblastoma Osteochondroma Osteoid osteoma Osteomyelitis, chronic Paget’s disease Plasmacytoma Plasmacytoma, multiple SBC Total 33 diseases Percent (%)

Number of cases

9 1 4 1 5 1 4 1 1 2 5 1 6 1 1 1 4 1 1 8 1 2 6 14 1 1 4 7 2 1 1 1 6 105

Cases correct SMR

Bayes’

Combined

6 1 4 1 5 1 4 1 1 2 5 0 6 0 1 0 4 1 1 8 1 2 6 11 1 0 4 6 1 1 1 1 6 93 89

7 1 3 1 5 1 4 1 1 2 5 1 5 0 1 0 4 1 1 6 1 2 6 11 1 0 4 6 1 1 1 1 6 91 87

8 1 4 1 5 1 4 1 1 2 5 1 6 0 1 0 4 1 1 8 1 2 6 11 1 0 4 6 1 1 1 1 6 96 91

a One hundred-five cases were used for the second stage of the KB validation. Some of these were from the teaching files of the Radiology Department and other cases were radiographs of routine patients. After every batch of 30–40 cases, corrections were made in the KB so that the system could arrive at the correct diagnosis in as many cases as possible.

sions added by the probabilistic methodology. These were necessary in order to fill in gaps left by the logical method, which was less capable of coping with incomplete case data. Probabilistic reasoning, which must list the most probable diagnosis, even when there is insufficient data, and on the other hand takes all data into account, is quite different from the radiologist’s reasoning.

However, the average number of diagnoses per case, 4.6, was considered reasonable by the radiologists who used the system, whose own DD’s often included up to four diagnoses. The additional diagnoses suggested by the system allow the radiologist to be aware of all possible interpretations of the case data. In a comparative study of computerized diagnostic systems, an objection

152

I. Lejbkowicz et al. / Computer Methods and Programs in Biomedicine 67 (2002) 137–154

was raised to the proliferation of diagnoses readily discarded by knowledgeable physicians but confusing to the less experienced ones [23]. In our system, the additional diagnoses were both few and relevant because of the more focused extent of the knowledge base. The results shown here demonstrate that the system can contribute to the radiologist’s diagnostic capability. Besides this direct contribution, the system can contribute to the radiologists work in additional ways, like serving as a quick reference tool. Evaluation studies which take into consideration only the system’s ability of reaching correct conclusions, do not measure the real effect of the system on healthcare, since they ignore the additional help that the system can provide [24]. This additional help is difficult to measure. We assumed that the physicians attitude toward the system, i.e. whether or not they intend to use the system in

the future and their opinion on the system’s contribution to their work, represents a valid measure of the contribution of the ES as a whole. We asked the radiologists who participated in the clinical evaluation of the system, if they think the system could help them in their work. Three of them said the system could help in the DD, by suggesting additional diagnoses. In addition, two of them said that the system could improve the reading of the radiographs due to its systematic approach. All of them said that the system can help less experienced radiologists, could be used as a teaching-aid and will allow collection of data for research. The radiologist’s opinion about the system, together with their intention to record in the system all new cases of bone tumors, indicate that the system was well accepted by them. The favorable response to questionnaires by groups of medical students

Table 3 Clinical evaluation cases and the radiologist’s and system’s diagnostic capabilitya Lesions

Aneurysmal bone cyst Chondroblastoma Chondrosarcoma Enchondroma Ewing’s sarcoma Fibrous dysplasia Histiocytosis X Lymphoma of bone Metastases Malignant fibrous histiocytoma Multiple myeloma Non-ossifying fibroma Osteosarcoma Osteosarcoma, periosteal Osteochondroma Plasmacytoma SBC Total 17 diseases Percent (%) a

Number of cases

4 1 2 2 5 3 1 2 4 1 4 7 7 1 5 2 8 59

Cases correct Radiologists

System

3 1 1 2 3 3 1 1 4 1 3 7 7 1 5 1 8 52 88

3 1 0 2 4 3 1 0 4 0 3 7 7 1 5 1 8 50 85

(4)

(2) (54) (91.5)

Radiographs from 59 patients randomly selected from the patients seen in the Rambam Medical Center in Haifa and at Hadassah Hospital in Jerusalem in the past years, were entered in the system by four radiologists with expertise in bone lesion. All cases had a final diagnosis confirmed by biopsy and they represented a total of 17 different bone lesions. The number of cases of each lesion together with the number of cases correctly diagnosed by the radiologists and by the system are shown in the table. The correct diagnoses in the adjusted (after consulting the system’s DD) radiologist’s DD is shown in parentheses.

I. Lejbkowicz et al. / Computer Methods and Programs in Biomedicine 67 (2002) 137–154 Table 4 Agreement between the system’s and the radiologist’s DDa [3] Number of cases

Percentage [4]

(a) By case N =59 cases Entire radiologist’s DD included in system’s DD Partial radiologist’s DD included in system’s DD No diagnosis common to both DDs System’s diagnoses accepted by the radiologists

[5] 40

68

15

25 [6]

4b 27

7 46 [7]

Number of diagnoses (b) By diagnoses N =59 cases Radiologist’s DD System’s DD Agreement Final agreement Final disagreement Final undecided

[8]

[9] 120 273 91 133 104 36

33 49 38 13

[10]

[11] a

Fifty-nine cases with a final diagnosis established by biopsy, were entered by four senior radiologists with experience in bone tumors. The radiologist also recorded his/her DD for the entered case. The system’s DD was then obtained and the radiologists entered their agreement or disagreement with each diagnosis in the system’s DD, adjusting their own DD accordingly. From this data, we could assess the agreement between the radiologist’s and the system’s DD both by comparing the cases DD (Table 4a) and by determining the radiologist’s agreement or disagreement to each diagnosis listed by the system (Table 4b). b The radiologists did not give any DD for two of these cases.

[12]

[13]

[14]

[15]

[16]

and residents who tried out the system is an additional encouraging sign of the system’s acceptability.

[17]

[18]

References [1] J.H. Mirra, Bone Tumors, Lea & Febiger Publishers, Philadelphia, 1989. [2] G. Gell, R. Fotter, Computer assisted diagnosis of bone tumors, in: J.H. Van Bemmel, F. Gremy, J. Zvarova (Eds.), Medical Decision Making: Diagnostic Strategies

[19] [20] [21]

153

and Expert Systems, North Holland, Amsterdam, 1985, pp. 115 – 120. M.A. Simon, H.A. Finn, Diagnostic strategy for bone and soft-tissue tumors, J. Bone Joint Surg. Am. 75 (4) (1993) 622 – 631. G. Gell, Expert systems as a support for radiological diagnosis, Eur. J. Radiol. 17 (1993) 8 – 13. A. Franco, J.D. King, F.L. Farr, J.S. Clark, P.J. Haug, An assessment of the radiological module of NEONATE as an aid in interpreting chest X-rays findings by nonradiologists, J. Med. Syst. 15 (1991) 227 – 286. R.T. Macura, K.J. Macura, V. Toro, et al., Computerized case-based instructional system for computed tomography and magnetic resonance imaging of brain tumors, Invest. Radiol. 29 (1994) 497. J.A. Swets, D.J. Getty, R.M. Pickett, et al., Enhancing and evaluating diagnostic accuracy, Med. Decis. Making 11 (1991) 9 – 18. P.L. Miller, H.A. Swett, ICON: a computer-based approach to differential diagnosis in radiology, Radiology 163 (1987) 555 – 558. P.L. Miller, C. Shaw, J.R. Rose, H.A. Swett, Critiquing the process of radiologic differential diagnosis, Comput. Methods Programs Biomed. 22 (1986) 21 – 25. P.G. Mutalik, G.G. Weltin, P.R. Fisher, et al., The prospect of expert system-based cognitive support as a by-product of image acquisition and reporting, J. Digit. Imaging 4 (1991) 233 – 240. H.A. Swett, P.R. Fisher, A.I. Cohn, et al., Expert systemcontrolled image display, Radiology 172 (1989) 487 – 493. M. Strotzer, P. Kros, P. Held, S. Feuerbach, Accuracy of artificial neural networks in radiological differential diagnosis of solitary bone lesions, Rofo Fortschr Geb Rontgenstr Neuen Bildgeb Verfahr 163 (3) (1995) 245 – 249. D.W. Piraino, S.C. Amartur, B.J. Richmond, et al., Application of an artificial neural network in radiographic diagnosis, J. Digit. Imaging 4 (1991) 226 – 232. W.R. Reinus, A.J. Wilson, B. Kalman, S. Kwasny, Diagnosis of focal bone lesions using neural networks, Invest. Radiol. 29 (6) (1994) 606 – 611. G.S. Lodwick, C.L. Haun, W.E. Smith, R.F. Keller, E.D. Robertson, Computer diagnosis of primary bone tumors. A preliminary report, Radiology 80 (1963) 273 – 275. E.H. Shortliffe, Computer programs to support clinical decision making, J. Am. Med. Assoc. 258 (1987) 61 – 66. F. Wiener, SMR (simulating medical reasoning): an expert shell for non-AI experts, Comput. Methods Programs Biomed. 26 (1988) 19 – 32. F. Wiener, C.H. De Verdier, T. Groth, The use of knowledge-based information systems for interpreting specialized clinical chemistry analysis — experience from erythrocyte enzymes and metabolites, Scand. J. Clin. Lab. Invest. 50 (1990) 247 – 259. D. Resnik, Diagnosis of Bone and Joint Disorders, third ed., Saunders, Philadelphia, 1995. B.J. Manaster, Skeletal Radiology, Year Book Medical Publishers, Chicago, 1989. F. Wiener, D. Laufer, Computer-aided diagnosis of odon-

154

I. Lejbkowicz et al. / Computer Methods and Programs in Biomedicine 67 (2002) 137–154

togenic lesions, Int. J. Oral Maxillofac. Surg. 15 (1986) 592 – 596. [22] F. Wiener, M. Gabbai, M. Jaffe, Computerized classification of congenital malformations using a modified Bayesian approach, Comput. Biol. Med. 17 (1987) 259 – 267.

[23] E.S. Berner, G.D. Webster, A.A. Shugerman, et al., Performance of four computer-based diagnostic systems, New Engl. J. Med. 330 (1994) 1792 – 1796. [24] J.P. Kassirer, A report card on computer-assisted diagnosis-the grade: C, New Engl. J. Med. 330 (1994) 1824 – 1825.