Automatic interpretation of biological tests

Automatic interpretation of biological tests

PERGAMON Computers in Biology and Medicine 28 (1998) 183±192 Automatic interpretation of biological tests Zizette Boufriche-BoufaõÈ da * Laboratoire...

468KB Sizes 1 Downloads 130 Views

PERGAMON

Computers in Biology and Medicine 28 (1998) 183±192

Automatic interpretation of biological tests Zizette Boufriche-BoufaõÈ da * Laboratoire Lire, Institut d'informatique, Universite de Constantine, Route d'AõÈn El Bey, 25000 Constantine, Algeria Received 20 June 1997

Abstract In this article, an approach for an Automatic Interpretation of Biological Tests (AIBT) is described. The developed system is much needed in Preventive Medicine Centers (PMCs). It is designed as a selfsucient system that could be easily used by trained nurses during the routine visit. The results that the system provides are not only useful to provide the PMC physicians with a preliminary diagnosis, but also allows them more time to focus on the serious cases, making the clinical visit more qualitative. On the other hand, because the use of such a system has been planned for many years, its possibilities for future extensions must be seriously considered. The methodology adopted can be interpreted as a combination of the advantages of two main approaches adopted in current diagnostic systems: the production system approach and the object-oriented system approach. From the rules, the ability of these approaches to capture the deductive processes of the expert in domains where causal mechanisms are often understood are retained. The object-oriented approach guides the elicitation and the engineering of knowledge in such a way that abstractions, categorizations and classi®cations are encouraged whilst individual instances of objects of any type are recognized as separate, independent entities. # 1998 Elsevier Science Ltd. All rights reserved. Keywords: Biological test interpretation; Preventive medicine; Object orientation; Implication relations; Markerpropagation-based inference

1. Introduction In a Preventative Medical Center (PMC), before the clinical visit, the ®rst task of the physician is to detect certain or potential pathologies from the patient's check-up. The physician consults information issued from di€erent sources. The questionnaire ®lled in by the patient provides him with various information such as the patient's age, sex, his medical antecedents, his family medical history, and so on. * Tel and Fax: 213-4-92-28-35; E-mail: [email protected]. 0010-4825/98/$19.00 # 1998 Elsevier Science Ltd. All rights reserved. PII: S 0 0 1 0 - 4 8 2 5 ( 9 7 ) 0 0 0 4 3 - 7

184

Z. Boufriche-BoufaõÈda / Computers in Biology and Medicine 28 (1998) 183±192

Table 1 Reference limits of the hemoglobin test depending on population age and sex Men Age 4±10 10±14 14±18 18±25 25±35 35±45 45±55 55±65

Women

N

2,5

50

97,5

N

2,5

50

97,5

1878 1170 631 571 1480 744 341 173

116 125 135 141 141 140 137 136

133 140 154 160 158 157 156 157

148 158 174 178 176 175 174 176

1754 1070 550 574 1175 566 291 86

116 124 126 124 119 116 115 127

132 139 140 141 139 139 141 141

149 153 156 157 156 158 158 159

The results of biological tests (140) and functional explorations (pulmonary X-ray, electrocardiogram, . . .) are another source of information. The physician analyses these results in order to ®nd any abnormality that will facilitate the clinical visit. Before seeing the patient, he can ask for complementary tests which permit him to con®rm any potential pathology. This analysis is a long and complicated step for many reasons. To illustrate this fact, take the biological tests as an example. They consist of measuring some blood, serum and urea constituents and comparing results with reference limits. The limits are provided having been studied by large hospitals or medical schools [1, 2]. Reference limits (usually expressed in tables) are sometimes not ®xed and they are not only dependent on variable factors, such as the patient's age and sex (see Table 1), but also on his consumption of alcohol, tobacco and drugs, his physiological state (overweight, for instance) and so forth. Then, a result outside the reference limits is not necessarily abnormal before considering these factors. Moreover, for certain tests, the reference limits are simply descriptive and cannot be used in a clinical interpretation. Thus, new limits called decision limits are de®ned from medical knowledge and from di€erent states of the patient's health (see Table 2). The decision limits are multiple, and each of them leads to a di€erent pathology. Finally, above certain critical values, the variation factors are not considered because this would result in emergency thresholds being reached. Most of the pathologies diagnosis uses the comparison of biological test values and the corresponding thresholds considering variation factors (VF). For instance, anemia is diagnosed if the hemoglobin value is under the lower reference limit. The lower reference limit is known after consideration of the patient's age and sex. The hemoglobin value is reduced by 20% if the patient is pregnant and augmented 6% if she is a big smoker. Table 2 Decision and emergency limits of the hemoglobin test Decision limits High 200

Emergency limits Low 105

High

Low 45

Z. Boufriche-BoufaõÈda / Computers in Biology and Medicine 28 (1998) 183±192

185

The following sections outline the main issues that deal with the system knowledge representation and reasoning. A system for biological test interpretation is de®ned as a full knowledge-based system with substantial knowledge and criteria for making decisions. It can be used by trained nurses during routine consultations. The system commences the operation by requesting basic information about a patient's check-up. The user is then asked a series of questions requiring numeric values and yes/no answers. The system tries to determine abnormalities. It identi®es con®rmed pathologies with a level of seriousness, the suspected ones, and can ask for complementary tests if required. The outcome of the system is a list of diagnoses and advice on whether the patient has an urgent need of treatment or not. The purpose of this work is to suggest a modeling of implication relations with speci®c procedural inference with respect to object orientation principles. Implication relations are asked to address the implication's mechanism supported by conditions and conclusions of if/ then rules. The domain knowledge is organized into a hierarchy of classes through a system of super-links and sub-links: a class describes the common aspects observed in all of its instances and a superclass describes common aspects of all of its subclasses. Such classes are augmented with implication relations that are asked to link two or more concepts. Dependency information between concepts is obtained from the expert (the PMC physician). For instance, the hemoglobin test is known to imply anemia, low hemoglobin or polyglobulia (depending on what limits are considered). The obtained relations are con®gured in a network manner with nodes and links between nodes. The nodes represent knowledge objects and the arcs the conditional moves between them. The problem solver is data driven, based on the propagation of the test results through the network. The data-¯ow is managed by explicit object-oriented features. The network topology is sensitive to a pattern order within rules. In particular, traversed paths are organized into chains that are believed to represent the chains the expert might have gone through.

2. Knowledge objects and implication relations All knowledge representation schemes can be easily described as relations [3]. A good example is a semantic network where relations between nodes may be inheritance relations, part±whole relations or others. Whereas many knowledge representation systems provide powerful mechanisms to model semantic relations, they lack general facilities for con®guring production rule relations and performing inference on the system con®gured in a network manner [4]. The advantages of adopting a hybrid approach in Automatic Interpretation of Biological Tests (AIBT), can be seen in the way that the traditional object-oriented approach is adopted for providing a base-level description of medical concepts, in which aspects of a production system approach can be included in order to cater for diagnostic reasoning. The advantages of the object-oriented approach are well known. This approach provides a rich structure in the knowledge base description [5]. Its class and subclass taxonomies describe the hierarchy of the knowledge base and make the specialization of the inherited structure possible. Any derived object can rede®ne the control methods to meet the requirements of speci®c situations or simply inherit them from the base object. The abstraction concept is also used to provide the necessary ¯exibility for knowledge modeling [6]. To deal with the deductive

186

Z. Boufriche-BoufaõÈda / Computers in Biology and Medicine 28 (1998) 183±192

Fig. 1. Partial view of the hierarchy of the knowledge objects.

processes captured by implication relations in production rules, the idea is to allow each concept to encapsulate, at the individual level, a piece of knowledge concerning its relationships with other concepts. So, every possible in¯uence by one concept on another is represented by an arc which translates the permissible causal move between them. In the case of AIBT, the objects used for modeling the knowledge base are broken down into general classes. The system concepts comprise patient history, medical tests and pathologies. We see the Medical-entity class as being the root of a specialized hierarchy from which are derived all the domain objects classi®ed into Tests and Pathology classes (see Fig. 1). The Tests class involves quantitative and qualitative test classes. We call quantitative tests, medical tests based on numeric parameters like the biological ones. Qualitative tests refer to objects based on symbolic values like functional explorations or X-ray analysis. Patient questionnaires are translated into qualitative tests. Objects are modeled by aggregations of attributes, e.g. test identi®cation, reference limits, decision limits and emergency limits for biological tests. Some limits are absent (no emergency limits for cholesterol test) or multiple (many decision limits for mean corpuscular volume test). Pathology attributes include the pathology state (suspected, con®rmed, not established), the level of its seriousness (de®ned on a 0±3 scale) if con®rmed, and so on. The class Tests is connected to the Pathology class by an implication relation implying that each of the instances of the Tests class represents a medical ®nding that might be a cause for one or more pathologies. So, when some properties of the class Tests are satis®ed, they should enable the system to conclude about some pathologies. For example, the hemoglobin test is de®ned as being a cause for either anemia, polyglobulia or low hemoglobin pathologies, depending on whether reference limits, decision limits or a higher ®xed limit (120) are under consideration. Fig. 2 shows the implication relation between the hemoglobin test and anemia, polyglobulia and low hemoglobin.

Z. Boufriche-BoufaõÈda / Computers in Biology and Medicine 28 (1998) 183±192

187

Fig. 2. Schematic representation of the implication relation.

An implication relation can be described by local attributes within the participating objects. The content of these attributes is represented in the class Tests as a pointer referencing a list of pathologies, and in the class Pathology, as a pointer referencing a particular list of AND/OR connections of causes that might lead to it. This bi-directional chaining is strongly needed during the reasoning process. It can be used for di€erent purposes such as inferring new information, building explanations or simply for consultation. The instance described for each cause outlines its most important properties. In particular, it includes the biological test identi®cation, the conditional expression associated with the test result, a state (true or false) after the condition evaluation and so forth. The conditional expression uses numbers, operators (<, >, =, . . .) and generic names ($HD for Higher Decision limit, $LR for Lower Reference limit, $LE for Lower Emergency limit, $value for the measured value of the test, etc.). Generic names are useful because the limits of the tests are not ®xed and depend on many VF. They will be known and replaced by e€ective limit values at the step of reasoning. Objects with implication relations can be conveniently represented as an acyclic AND/OR graph, called Marker Propagation Graph (MPG), in which the nodes represent knowledge objects and the arcs represent the conditional moves between them. The MPG is interpreted as follows: incoming nodes are causes for which consequent nodes result under some conditions. Thus, directionality in the MPG denotes independence, that is, a connection from node i to node j is active if and only if the condition which labels the arc between the two nodes is true. The entire knowledge base may need to be represented by multiple MPG, if the system has multiple, independent goals. The possible paths to a goal are determined by the truth or the falsity of the conditions that connect the nodes.

3. The reasoning process The power of representation formalisms depends, on the whole, on the inference methods it provides in problem solving. An inference mechanism consists of deducing new information from the original information. The reasoning therefore relies on the series of deductions that

188

Z. Boufriche-BoufaõÈda / Computers in Biology and Medicine 28 (1998) 183±192

should lead to the solution of the problem. In the rule-based systems, an inference engine is responsible for the manipulation of knowledge rules and the production of new facts. The MPG organization can be viewed as a radical improvement of the classic rule-based organization. First, because the MPG topology is sensitive to pattern order within rules, it bene®ts from the rule-based representation features. In particular, traversed paths in the MPG are organized into chains that are believed to represent the chains of reasoning the expert might have gone through. Relationships between knowledge objects are needed to keep traces of the exploitation mechanisms making the user able to understand the overall reasoning the system uses to solve the problem. They also make the system able to provide some form of informative justi®cations and explanations of the problem solving process that it has followed. Second, the data-driven reasoning is more obvious since the functional aspect of the inference in distributed in each object and managed by local control knowledge. Local control knowledge consists of methods used to interpret medical test conditions (Eval_cond()), to diagnose con®rmed or doubtful situations (Diagnose()), and to compute their seriousness level (Eval_gravity()) (see Fig. 3). Considering embedded knowledge, each type of node may have a di€erent method to evaluate conditions, to determine the pathology seriousness level and to propagate the marker. The computation of conditions can be processed in the same way as the evaluation of classic logical expressions. When some terms in a condition have generic names, the Eval-cond() method will ®rst replace them with speci®c numeric values and then apply a conventional expression±interpretation algorithm. The importance of the disease gravity may be decided according to the patient laboratory tests and chosen from {0, 1, 2, 3}. Except for ``0'' which denotes a normal state, ``1'' means that the patient's state must be passed on to his family doctor, ``2'' that his state must be monitored and a recommendation can be made that complementary investigations should be considered and ``3'' that the patient is in urgent need of medical treatment. There are special cases. For anemia, the Eval_gravity() method returns 0, 1, 2 or 3 depending on whether the

Fig. 3. The Pathology class.

Z. Boufriche-BoufaõÈda / Computers in Biology and Medicine 28 (1998) 183±192

189

Fig. 4. A problem speci®cation.

hemoglobin value is respectively between the ranges of normality or under the lower reference limit, decision limit or emergency limit. In a more general way, when a pathology diagnosis supports AND/OR connections, its seriousness level is inferred along an AND/OR tree and the highest value is taken. Implication relations are used for inference chaining in the MPG. The inference process is data driven. It is triggered if there are new facts or input data from the user. Its results are codi®ed by active arcs and marked nodes in the MPG. An arc is activated if the condition associated with it becomes true and a node is marked only if the arcs ending at that node are active. Inferring from the MPG is ®nally seen as a process that marks nodes and propagates the marking as far as possible in the MPG. The system uses a depth-®rst recursive traversal of the graph to compute the conditions oneby-one, label the arcs, and mark the nodes that meet these conditions. Initially, all nodes of the MPG are unmarked and all arcs are passive. A problem is speci®ed by stating the patient identi®cation followed by his biological tests and functional explorations (see Fig. 4). These known data are speci®ed marking the appropriate nodes (called starting nodes) in the MPG. More generally, each marked node has to evaluate conditions associated with arcs ¯owing from it and to activate some of them. Downstream from the new activated arcs, the marking of the nodes is updated. A new node is marked only if all arcs (AND-constructs) or at least one arc (OR-constructs) ending at it are activated. The new markers are propagated and so on (see Fig. 5). Starting at a marked node y, the principle of marker propagation is summarized in the following algorithm: For each of the passive arc l ¯owing from y {evaluate the conditions associated with l; if l is activated {let x be the node reached by l; if x is an AND node {if all arcs ending at x are active {mark x; do again for each of the passive arc ¯owing from x; exit;}} if x is an OR node {if it exists, at least one activate arc ending at x {mark x; do again for each of the passive arc ¯owing from x; exit;}}}

190

Z. Boufriche-BoufaõÈda / Computers in Biology and Medicine 28 (1998) 183±192

Fig. 5. A part of the marker propagation graph after inferring from a patient check-up.

4. Discussion In a PMC, the physician receives all day an important number of people for their periodic visit. On seeing the patient for a medical check-up, he must be able to ®nd abnormalities that would be con®rmed in the clinical visit. The diagnosis of abnormalities is quite dicult because of the great number of laboratory tests, variable thresholds under consideration (possibly thousands of tables of data) and also the time constraint. The physician has to also produce case summary reports destined for the family doctor of the patient. In these reports, he must supply any information that is classi®ed as crucial. In such conditions, the doctor's diagnostic accuracy is only 85% on average. Thus, a system for automatic interpretation of medical tests is strongly needed. This system is developed as a self-sucient system that could be used by trained nurses or other similarly quali®ed health personnel during medical preventive consultations. The aim of the development is to provide preliminary diagnostic support to the PMC physicians and allow them the time to investigate serious cases in a more detailed way and in a more qualitative clinical visit. Such a system also improves the service to the patients

Z. Boufriche-BoufaõÈda / Computers in Biology and Medicine 28 (1998) 183±192

191

Table 3 The system output Patient identi®cation Past medical history Environment factors Laboratory tests Final medical conclusions Recommendation

Identi®cation number, visit date, name, age, weight Past diagnosed diseases Job conditions, living conditions, physiological conditions Test name, value, ranges Con®rmed pathology, severity Suspected pathologies, future investigations

and encourages better resource usage and management at the PMCs. Thus, a source of knowledge that can be instantly accessed will provide continuous diagnostic support and will open the perspective for future studies on pathology distribution considering the population age, sex, living conditions, job conditions, and so forth, which will help towards the elaboration of national health plans. The information is extracted from the MPG which synthesizes the health state of the patient: marked nodes correspond to con®rmed pathologies and active arcs represent the inferring chain that the system would have gone through. A pathology is suspected for several reasons. One reason is that inactive links leading to the related node in the MPG correspond to the laboratory tests which are not provided in the PMC. In such a case, patients will be asked for complementary tests. The knowledge base was intentionally built without uncertainty measures, as suspected cases will be examined in the clinical visit. The format in which results have to be displayed was devised with a PMC physician. One important function performed at this stage is to determine what information can be left out. For example, if a macrocytic anemia is diagnosed (see Fig. 5), any information is needed about anemia and macrocytosis. The physician suggests a quick review of the highlights of the patient conditions (identi®cation, past medical history, family antecedents), including the tests with the last inferred pathologies, their seriousness level and also suspected cases with recommended investigations. The system reads information from the MPG which provides the raw material for this output and organizes it in a tabular form (see Table 3). The system design and implementation in C++ [7] is complete. Particular attention should be given to the system extension possibilities. This is important when one realizes that such systems are going to spend most of their lives being changed, updated and improved [8]. This would not have been possible without a good appreciation of object-oriented features. A more technical description of this aspect can be found in Ref. [9]. Implementing the system in C++ makes possible its future combination with commercial graphic user interfaces, C++ development environments and other C++ object libraries containing methods supporting other paradigms. 5. Summary An approach for an Automatic Interpretation of Biological Tests (AIBT) is described. The developed system is much needed in Preventive Medicine Centers (PMCs) where the physicians receive all day a large number of people for their periodic visit. The system is designed as a

192

Z. Boufriche-BoufaõÈda / Computers in Biology and Medicine 28 (1998) 183±192

self-sucient system that could easily be used by trained nurses. The results of the system are not only useful to provide the PMC physicians with a preliminary diagnosis, but also to allow them more time to focus on the serious cases and make the clinical visits more qualitative. The methodology adopted can be interpreted as a combination of the advantages of two main approaches adopted in current diagnostic systems: the production system approach and the object-oriented system approach. The innovative idea is to combine knowledge objects in a Marker Propagation Graph (MPG) which provides the rule-based like representation features. This approach preserves the best of both object orientation features and expert system functionality. Particular attention should be given to the system extension possibilities which would not have been possible without a good appreciation of object-oriented features. Acknowledgements This work was initiated by a cooperation with Centre de MeÂdecine PreÂventive de Nancy and Centre de Recherche en Informatique de Nancy (CRIN/CNRS). The author is particularly grateful to Karl Tombre for his constructive comments and Dr Pierre Leduc for his medical explanations. References [1] G. Siest and J. Henny, A savoir sur: les examens biologiques et leurs facteurs de variation. Sandoz Editions (1986). [2] G. Siest, J. Henny and F. Schiele, InterpreÂtation des examens de laboratoire. Karger Editions (1981). [3] F.N. Teskey, Representation and reasoning in arti®cial intelligence. In Engineering, ed. G. Winstanley (1991). [4] R. Agarwal, The entropy e€ect and sensitivity analysis in rule-based expert systems, Int. J. Expert Syst. 8 (4) (1995) 309±325. [5] F. Benchika and Z. Boufriche-BoufaõÈ da, Les systeÁmes aÁ objets: repreÂsentation multi-points de vue et raisonnement par classi®cation. In Proceedings of the 3rd International Symposium on Programming and Systems, Algiers, pp. 296±315 (1997). [6] I. Graham, Object-oriented Methods. Addison-Wesley, Reading, MA (1991). [7] S. Lippman, C + + Primer. Addison-Wesley, Reading, MA (1989). [8] R. Davis, Expert systems: where are we?And where do we go from here?. The Arti®cial Intelligence Magazine Spring (1982) 3±22. [9] Z. Boufriche-BoufaõÈ da, Object cooperation for clinical test interpretation. In Proceedings of IEEE International Conference on Systems, Man and Cybernetics, San Antonio, Texas, pp. 806±811 (1994).

About the AuthorÐZizette Boufriche-BoufaõÈ da is an Assistant Professor in the Department of Computer Science at the University of Constantine, Algeria, since 1987. From 1983 to 1987, she was aliated to the Center of Computer Science Research (CRIN) of Nancy, France, where she received her Doctorat 3eÁme Cycle. She is currently a member of the laboratory LIRE of the University of Constantine and pursues her collaboration with ISA Group of Nancy. Her current research interests include knowledge representation and reasoning, and object orientation.