Effective retrieval in Hospital Information Systems: The use of context in answering queries to Patient Discharge Summaries

Effective retrieval in Hospital Information Systems: The use of context in answering queries to Patient Discharge Summaries

Artificial Intelligence ELSEVIER Artificial Intelligence in Medicine 6 (1994) 207-227 in Medicine Effective retrieval in Hospital Information Syste...

1MB Sizes 2 Downloads 31 Views

Artificial Intelligence ELSEVIER

Artificial Intelligence in Medicine 6 (1994) 207-227

in Medicine

Effective retrieval in Hospital Information Systems: The use of context in answering queries to Patient Discharge Summaries Brenda Nangle *, Mark T. Keane Dept. of Computer Science, Trinity College, Dublin 2, Ireland

(Received October 1993; revised January 1994)

Abstract The move towards the electronic storage of medical records in Hospital Information Systems (HISS) presents significant challenges for AI retrieval techniques. In this paper, we argue that adequate information retrieval in such systems will have to rely on the exploitation of the conceptual knowledge in those records rather than superficial string searches. However, this course of action is dependent on the developments of natural language processing techniques and on retrieval systems that can exploit semantic/ conceptual knowledge. We present a retrieval system, which attempts to realise the second of these developments. This system, called CONIR [developed in the context of the European Community project MENELAS (AIM 202311operates in the domain of Patient Discharge Summaries on coronary illness. CONIR uses flexible retrieval techniques, that exploit conceptual context information, over a database of elaborated semantic records. In the course of the paper we outline the sorts of knowledge structures that are required to do this type of retrieval and indicate how they are constructed. Key words: Patient discharge summaries; Coronary illness; Hospital Information Systems; Knowledge representation; Conceptual graphs; Temporal discourse structure; Conceptual context description; Conceptual information retrieval

1. Introduction The move towards the electronic storage of medical records in Hospital Information Systems (HISS) presents significant challenges for AI retrieval techniques.

* Corresponding

author. Email: [email protected]

0933-3657/94/$07.00 0 1994 Elsevier Science B.V. All rights reserved SSDI 0933-3657(94)00002-A

208

B. Nangle, M. T. Keane /Artificial Intelligence in Medicine 6 (1994) 207-227

The dream is that these techniques will afford us new and better ways of accessing information in HISS. The nightmare is that future HISS will be less accessible than current paper filing systems. In this paper, we hope to show that the dream is more probable than the nightmare, by proposing a system which provides for flexible retrieval of information from textual, medical records. Our focus is on the ubiquitous Patient Discharge Summary (PDS), a medical record which summarises the events of a patient’s hospital visit. Currently, PDSs are textual documents, stored in conventional paper filing-systems. PDSs are typically accessed under the patient’s name and/or the name of the doctor who treated the patient. The current system clearly works, although the physical retrieval of records is cumbersome and the ‘indices’ for file access are limited. If PDSs are stored in an electronic form, access to them should be more flexible and efficient. Electronic PDSs should be accessible under a number of different headings or indices (patient’s name, patient’s illness, date of visit, doctor, nature of treatment given, some arbitrary patient characteristics like age). It should also be possible to query specific aspects of such PDSs and receive a direct answer to this query. Furthermore, future systems should be able to meet the retrieval requirements of different user groups. Research on the MENELAS project (see AIM 1993) suggests that two main classes of users would use such a retrieval system: medical and managerial users. Medical users are interested in patient care, and hence will pose clinicul queries; a class of queries which deals with demographic or medical criteria. For example, (i) “In the case of John Smith, in which vessels did the lesion occur?“, or (ii) “Were there any complications involved during the catheterisation performed on Mary Jones?“, or (iii) “What is the history associated with John Smith?“. Managerial users are more interested gathering hospital statistics to manage medical services and will pose statistical queries: for example, (i) “What is the mean age of patients who underwent angioplasty during the month of June 1993?“, or (ii) “How many patients were admitted into ward 11 during the month of December 1990?“, or (iii) “Provide a frequency distribution of the smoking habits of all male patients”. This paper reports on research designed to allow flexible access to and retrieval from HISS which store large numbers of Patient Discharge Summaries. The system, called ‘CONIR’ (CONtext driven Information Retrieval), retrieves information from PDSs stored as semantic/conceptual representations after natural language processing (NLP). CONIR is part of MENELAS, a multi-lingual NLP system for processing, storing and querying PDSs, funded under the EC AIM Initiative (A2023). We hope CONIR is a first approximation to future HIS retrieval systems. Before presenting the system, in Section 2 we will discuss relevant previous research on medical retrieval systems. In Section 3, we will outline the structure of CONIR. Sections 4 and 5 will introduce conceptual models of PDSs based on the contextual information found in the summaries. In Section 6, we will show how

B. Nangle, M. T. Keane /Artificial Intelligence in Medicine 6 (1994) 207-227

209

CONIR uses contextual information for efficient and robust retrieval from PDSs. Finally, in Section 7, we will consider the implications CONIR has for future HISS.

2. Previous research on information

retrieval in HIS

Many Information Retrieval (IR) techniques provide effective access to large collections of objects stored in some information system. Typically, the objects involved tend to be in textual form, making up messages, journal articles, data listings or some other data entity. The core processes in IR are indexing records stored in a database and of the user’s need) and retrieval (the comparison of these two representations in order to select records whose characteristics match those of the information need). Traditional IR systems use textual keywords or other statistical methods. In the 1970s and 1980s mathematical models were developed for document indexing and retrieval using vector-spaces and probability theory [lo-121. These techniques can be computationally-efficient but are nearing the limits of their effectiveness, because they rely on a coarse-grained set of indices or superficial textual keywords [16,17]. For example, the same string in a text can mean many different things depending on the domain; the verb, ‘to take’ can mean ‘to swallow medicine’, ‘to transport’, or ‘to rob’. One response to these limits is to use NLP to capture the actual meaning of the text in conceptual or semantic representations. More powerful conceptual ZR systems can then exploit these elaborated representations. While NLP techniques are also limited the language used in PDSs is particularly suitable for constrained uses of NLP. 2.1. Language style in medical texts The language style of medical texts have the characteristics of a sub-language, which has a restricted and simplified structure compared to a language as a whole: l Texts are written in telegraphic style. Verbs are often omitted and abbreviations, nominal sentences or simple word juxtapositions are frequently encountered. Sentences often consist of constituents without ‘syntactic glue’, making them difficult to be parsed at a syntactic level. l Texts are written in a closed domain. The basic vocabulary is made up of specialised technical and medical terms used to describe examinations, observations and to issue diagnoses. Thus, much of the ambiguity in normal texts is eliminated. So, the verb, ‘to take’, can now only mean ‘to swallow medicine’ in a report dealing with a medical prescription. l Texts are written in a compact and concise style, as opposed to complex literary styles that occur in ordinary texts, often leading to a rather simplified linguistic analysis. Therefore, the NLP of medical texts is a more tractable task. Moreover, this processing should emphasise the semantic level, since syntactic level rules will not be strictly respected.

210

B. Nangle, MT. Keane /Artificial Intelligence in Medicine 6 (1994) 207-227

2.2. &&ting HISS concerned with information retrieval Although IR systems in the medical domain have recognised the sub-langauge features of medical texts, only a few approach the dream system. The RIME system makes use of minimal semantic representations, using Schank’s (1973) conceptual dependency theory. Like a traditional IR system it retrieves entire documents rather than direct answers to specific questions. The LSP [9] system attempts to achieve conceptual IR by structuring the syntactic constituents of text in a relational database. However, the accuracy and robustness of retrieval may be at risk in LSP because as we have seen the syntactic structure of these texts is degraded. Furthermore, LSP will not find answers which are implicit in a text. InterMed [S] transforms natural language statements into a semantic network and will deductively add inferences implicit in the text. InterMed improves on the other systems because it uses an elaborated conceptual representation which goes beyond the analysed sentences. However, its retrieval may be inefficient because of the graph matching required. Finally, METEXA [13-151 is a conceptual retrieval system which employs Sowa’s conceptual graph theory [181 as its knowledge representation. In retrieval, the query is treated as a goal to be inferred directly or indirectly from stored texts. This retrieval method, although very powerful, can also require unnecessary graph matching to select a suitable rule. Inefficiency arises from the number of graph matches it requires. All of the above systems do not have the sort of capabilities which are necessary for an adequate medical retrieval system. No single system has (i) the flexibility to deal with the types of queries users want to pose in this area, (ii) the effectiveness to produce direct, relevant and precise answers to a variety of general and specific queries; and (iii) the efficiency to deliver the desired reply in a speedy fashion. These are the characteristics which we attempt to capture in the present CONIR system to which we now turn.

3. The CONIR system: Introduction

and overview

CONIR is the retrieval component of MENELAS. MENELAS carries out NLP on PDSs in the domain of cardiology and produces a conceptual representation of the text (using Sowa’s conceptual graphs). The original text of the PDS and this conceptual graph representation are then stored in a database, after a set of indices for the PDS have been extracted. CONIR is designed to use this database to answer both clinical and statistical queries. The answers CONIR provides may be specific answers (e.g. yes or no to a question), some form of summary data (e.g. the mean age of patients with a particular illness is 47) or a PDS record within which relevant sentences satisfying the query may be highlighted. In CONIR the indices for PDSs are encoded in an annotated conceptual hierarchy and the queries and PDSs are elaborated conceptual representations. It is the nature of

B. Nangle, M. T. Keane /Artificial Intelligence in Medicine 6 (1994) 207-227

Query

I

Answer

RETRIEVAL COMPONENT

PDSRetrieval

211

I

Answer Refinement

b

Knowledge Hierarchy ofIndices

I

DatabaseOfPDSTextsand their Conceptual Representations

I

Fig. 1. The architecture of CONIR.

these representations that allows CONIR to search for relevant and precise answers. The CONIR retrieval algorithm deals with queries in a two-step fashion (see Fig. l), which we term PDS retrieval and answer refinement. In the PDS Retrieval Stage, a first crude answer to the query is found in the form of a list of candidate PDSs. In Amer Refinement, a specific answer is found from the PDSs in this candidate set. A key novelty in the latter stage is the use of contextual aspects of the query to make search more relevant and efficient. Subsequent sections are devoted to the key aspects of the CONIR system.

4. The representations of PDSs and queries in CONIR

All of the processing carried out by the CONIR algorithm, on queries and PDSs, is carried out on conceptual graph representations. The conceptual graph formalism is a knowledge representation language initially designed to capture the meaning of natural language [HI. A conceptual graph may be defined as a finite, connected bipartite graph and illustrates the relationships between persons, things, attributes and events in a particular world; for our purposes the ‘medical world’. Conceptual graphs have been used in a number of natural language understanding projects [4], as well as in medical informatics [2,13-B]. 4.1 Conceptual graphs of PDSs

The conceptual graphs for PDSs produced in MENELAS are slightly more than a direct conceptual representation of the text. PDSs appear to have an invariant

212

B. Nangle, M. T. Keane /Artificial

Intelligence in Medicine 6 (1994) 207-227

PDS ADMINISTRATIVE

DATA

ON PATIENT

(e. ., name, age, sex ) lg

t ADMISSION DETAILS motive and date of admission) (e p

t PATIENT’S HISTORY (e ., past hospitalisations, lifestyle (e.g., smoker), family illnesses) I”

t HOSPlTALISATION EVENTS (e ., details of any treatments ,examinations, operations, diagnosis, signs, findings etc.) tg

t D&CHARGE EVENTS (e.g., date, medication and review information)

Fig. 2. Typical sequence of information

in a patient discharge summary.

higher-order structure which can be used to generate a much more useful representation. A large corpus of PDSs, in the cardiology domain, have a definite discourse structure. The PDS begins with information about administrative data on the patient (e.g. patient’s age or sex), the rest discusses the motive and date of admission, the patient’s history, notable events during hospitalisation and finally details on future planned treatments. Fig. 2 shows the typical sequence of information in a PDS. This information is represented as various temporal and topic contexts within the PDS. This use of contextual knowledge is one of the key features of the CONIR retrieval system. Every PDS sentence can be categorised under a specific PDS temporal and topic context. There are three types of PDS temporal contexts, 6) a past admission context, (ii) a present admission context, and finally, (iii) a future admission context,

B. Nangle, M. T. Keane /Artificial Intelligence in Medicine 6 (1994) 207-227

213

each one of which encapsulates temporal information localising a sentence to a past, present or future admission time respectively. There are five types of PDS topic contexts, (i> an administration context, (ii) an admission context, (iii) a history context, (iv) a hospitalisation context, and finally, (v) a discharge context, each of which enclose non-temporal information discussing patient data, admission, history, hospitalisation and discharge topics respectively. Unfortunately for this analysis, the representation of temporal and other contextual information in conceptual graphs is still an issue under research. 4.1.1 Representing contexts in conceptual graphs Sowa’s Conceptual Graph (CG) theory provides an elegant and powerful language for expressing the non-temporal content of a sentence, but various problems have been encountered in applying the theory to the modelling of temporal information [S]. Sowa proposes the use of points in time to represent temporal information (using the conceptual relation PTIM). However, this will not capture temporal expressions which refer to time intervals (e.g. ‘during 1991’, ‘in the afternoon’). Moulin [61 has extended CG theory to deal with such problems by using three temporal knowledge structures to represent the temporal information of a discourse; perspective, localisation and situation. The PDSs in the MENELAS system have a variety of temporal and topic contexts. We use both Sowa’s and a variant of Moulin’s theory to represent these contexts. So, the natural language sentence, ‘Last year, Mary took Pill X every day for two months’, can be conceptually modelled in terms of its contexts by means of a two-fold structure called the Conceptual Context Description (CCD; see Fig. 3). This CCD has an outer temporal context structure which represents conceptually the temporal localisation information expressed by the sentence being modelled (e.g. ‘last year’, ‘two months’, ‘every day’), and an inner topic context structure that represents the raw propositional content of the situation being described (e.g. ‘Mary took Pill X’). Specifically, in CONIR’s representations of PDS contexts, a temporal context structure is a pair; (Temporal Context Description; Temporal Time Attributes) or more concisely (TempCD; TempTA). The Temporal Context Description represents one of the three temporal contexts (i.e. past, present or future temporal admission context) depending on the temporal localisation information in the sentence being modelled. The Temporal Time Attribute parameter models the time interval, the duration of the interval and/or the frequency over which the localisation specified by the sentence is valid. It is a 7-tuple of the form:
EndTime, FrequencyScale>

TimeScale,

Duration,

DurationScale,

Fre-

214

B. Nat&e,

M. T. Keane /Artificial

Intelligence in Medicine 6 (1994) 207-227

The topic context structure is also a pair; (TopCD; CONCEPTUAL GRAPH). The Topic Context Description (TopCD) will state one of the five topics which the propositional content of the sentence being modelled discusses (i.e. administration, admission, history, hospitalisation, discharge). The CONCEPTUAL GRAPH parameter is the conceptual graph representation of the propositional content. These representational extensions give us all the expressive power we require to model PDSs in conceptual graphs. However, we do need to outline how these contexts can be derived from ‘raw conceptual graphs’ of a PDS. 4.1.2. Establishing contexts in PD.9 The above contexts can be automatically derived from the raw CGs produced by MENELAS’ NL processor. Each context is considered in turn. Establishment of temporal context description parameter. MENELAS produces a conceptual graph representation of the content of each PDS sentence. During the construction of these representations, the time parameter of the temporal context structure (TempTA) for individual sentences may be instantiated simply by extracting the explicit temporal information present. Then, knowing this parameter, the temporal context to which a sentence belongs may be established. For any PDS there is a temporal localisation axis for the times of events in the PDS. The origin of the axis is the present admission date of the PDS in ,question. The present admission temporal context will encapsulate all sentences in the PDS whose

TEMPORAL PERSPECTIVE: “now”

TEMPORAL CONTEXT:

TOPIC CONTEXT: patient history topic;

YLast year, Mary took pill X every day for two months ”

[person:‘Maty’] <- (agnt) -c- [take] -> (obj) -> [pill:%‘]

Fig. 3. Conceptual

context

description

for a sample

sentence.

B. Nangle, MT. Keane /Artificial

temporal information localises present admission date and information mentioned in the open time intervals of the past following sample portion of a

Intelligence in Medicine 6 (1994) 207-227

215

it to a time point or interval which lies between the the discharge date inclusive. All other temporal PDS will localise sentences within the left and right or future admission temporal contexts. Consider the PDS with the temporal information italicised:

“ . . . This 58 year old man was admitted on the 21.1.92 totally blocked LAD. He had a previous catheterisation had an infarct in December 1991 treated with streptokinase associated with ischaemic heart pain treated with tPA came to angioplasty on the 22.1.92. . . . ”

for angioplasty to a in January 1992. He and further changes in January 1992. He

The natural language analysis of the first sentence will have identified various timepoints in the sentence; specifically, an outer temporal context structure, (?;BeginTime:21 Jan 1992,TimeScale:date), and an inner topic context structure, (?;CONCEPTUAL_ GRAPH). Since, the first sentence of a PDS always states the motive and date of admission, it can be located at the origin of the localisation axis for the PDS and will belong to the present admission context. The temporal context structure for the first sentence will thus be, (‘present admission temporal context’; BT:21 Jan 1992,TS:date). Establishing the temporal contexts of the remaining PDS sentences will simply involve locating them on the localisation time axis with respect to the origin. For example, the temporal adjective, ‘previous’, in the second sentence will explicitly indicate a new localisation to the left of the origin (the past admission temporal context). The dates stated in the third and fourth sentences will indicate a past admission temporal context. The last sentence is localised between the present admission and discharge dates and hence belongs to the present admission temporal context. Establishment of the topic context desctiption parameter. Once we have established the temporal context for a sentence, the topic context is easier to establish because each individual temporal context will always contain certain topic contexts (see Fig. 4). So, having determined that a sentence belongs to the past admission

patient history topic context

present admission temporal context

future admissiun temporal context

patient attribute topic context

discharge topic context (medication, review)

admission topic context procedure topic context discharge topic context (date)

Fig. 4. The constraining

relationship

between

PDS temporal

and topic contexts.

216

B. Nangle, M. T. Keane /Artificial Intelligence in Medicine 6 (1994) 207-227

time context, we can immediately assume that it must belong to the history topic context and exclude all other possibilities. The concepts mentioned in a sentence are the second source of constraint on the topic context to which that sentence adheres. Recall that a PDS sentence may discuss one of five possible topics, administration, admission, history, hospitalisation, or discharge. Furthermore, each of these topics has topic-related concepts and an associated topic schema in the PDS script (see Fig. 5). A topic-related concept belongs to the semantic field of that topic. The semantic field of a concept is the set of words or concepts which we can expect to appear within its context. A topic schema is a very general ‘skeleton-like’ conceptual pattern of a typical PDS proposition discussing this topic. Classificatory knowledge structures which correspond to these different divisions - of topic, topic-related concepts and topic schemas - can be used to

TOPICS

PDS SCRIPT

TOPIC RELATED CONCEPTS

LDMINISTRATION

surname forename unit 8eX

address ward ............ (etc.)

4DMISSION

HISTORY

tPatkw (ale) (attr) (ailr) (allr)

-> IearoameJ% -a Itoremmel% -a-[unit]% -a Isul9b

(attr)-a [addreml% (attr)+ BardlJ.

procedure consultation admission .......... (etc.)

[admissions(gW -> [timed]% (cause)-> Itimedl’k

relative illness procedure lifestyle state history .........(etc.)

[history& (chrc) (chrc) (chrc) (chrc)

(PW

-+ IPatkat1-l.

<- [patkat]% -> [Wcstylel’k -> [timed]% <- [relative]-/.

action abnormality HOSPITALISATIOI

E&s symptom

in-body object medical-kxIl proceduTe diagnosis hospitalisation .........(etc.) DISCHARGE

medication review advice

(hospitsrisstionl->(ch~~>[~~l 1. (alplt) -> [doctor]% (PW -> Ipstkw’k

(chrc)-zItimedlJ.

Wchar& (ptat)‘> Ipatientl% (cbrc)-r [mrdicath~n)% (chrc)-r Inview)-/.

Fig. 5. The three levels of structures used for topic establishment.

B. Nangle, M. T. Keane /ArtijEal

Intelligence in Medicine 6 (1994) 207-227

217

establish the topic context of a conceptual graph. The algorithm which uses these knowledge structures is simple as we will illustrate using the conceptual graph representations of sample PDS sentences shown in Table 1. It begins by generalising all its concepts to ‘schema concepts’. These are highly general concepts used in the topic schema structures. For example, the more specialised concept, ‘angioplasty’, is generalised to the schema concept, ‘timed’ in CGl (see Table 1). Next, we attempt to ‘slot’ the schema generalised version of the original conceptual graph within each of the five topic schemas. Since a successful ‘slotting’ arises between CGl and the hospitalisation topic schema in Fig. 5, this leads us to the conclusion that it belongs to the hospitalisation topic context. If, however, there is no success with matching the schema generalised form of a conceptual graph (such as in the case of CG2 above) against each of the five topic schemas, then it is likely to express the same contextual information as the previous sentence. This will be the case if both the previous and present sentences contained concepts belonging to the semantic field of the same topic. For example, the list of the concepts present in the original CG2 are subtypes or types of the concepts, abnormality, action, medical-tool, and number. Matching this list directly against the semantic field of the hospitalisation topic context, to which the previous sentence belongs, is successful. Therefore, CG2 also belongs to the hospitalisation topic context.

Table 1 Conceptual graph representations

for sample PDS sentences

CGl ‘He came to angioplasty on the 22-l-92’

[patient: *a](-(ptnt)(-[angioplasty] CG2 ‘The LAD occlusion was crossed by a magnum wire and two inflations petformed using magnarail balloon

[occlusion]/(loc.int)_)[lad]% (obj)(-[cross]-)(ins)-)[magnum_wire]% (obj)(-[inflation]/(qty)-)[number:‘2’]% (ins)-)[magnarail_ balloon]-/-/. Schema generalised CGI

[patient](-(ptnt)(-[timed]. Schema generalised CG2

[abnormality]/(loc.int)-)[lad]% (obj)(-[timed]-)(ins)-)[medical_ (obj)(-[timed]/(qty)->[number:‘2’]% (ins)-)[medical_ tool]-/-/.

tool]%



218

B. Nangle, M.T. Keane /Artificial Intelligence in Medicine 6 (1994) 207-227

If this previous sentence check fails, then the concept list will be matched against each of the remaining topic-related concept lists to examine which is the most similar, thereby establishing the topic context to which the sentence belongs. 4.2 Conceptual graphs of queries Queries directed towards PDSs can also be represented in terms of the PDS contexts described in the previous section, although they are established in a slightly different way. 4.2.1 Representing contexts in queries Since queries require information mentioned in PDS sentences, they can be conceptually described in terms of the temporal and topic context structures introduced in Section 4.1.1. The only representational difference between the conceptual context description of a query and a PDS sentence is the presence of ‘question time attributes’ and ‘question concepts’ in the former, each of which express required temporal and nontemporal information respectively (see Table 2).

Table 2 Conceptual context descriptions for sample clinical queries (1) ‘On whaf date was John Smith admitted?’ Topic context structure: (admission; [admission]/. (ptnt)-)[patient]/. -)(attr)-)[firstname]-)bohn]% -)(attr)-)[sumame]-)[smith]-/.> Temporal context structure: (present admission; BT: ?,TS: date) (2) ‘What is the history associated with John Smith?’ Topic context structure: (history; [history:?](-khrc)(-[patient]/-)(attr)-)[firstname]-)(val)-)[iohn]% -)(attr)-)[surname]-)&al)-)[smith]-/) Temporal context structure: (past admission;null) (3) ‘Were there any complications involved during the catheterisation performed on John Smith?’ Topic context structure: (hospitalisation; [catheterisation]/-)(chrc)_)[complication:?]% -)(ptnt)-)[patient]/-)(attr)-)[firstname]-)(val)->bohn]% -)(attr)-)[surname]-)(val)->[smith]-/.) Temporal context structure: (present admission; BT: ?,TS: date)

B. Nangle, M.T. Keane /Artificial

Intelligence in Medicine 6 (1994) 207-227

219

Requested temporal information (such as ‘date’ in sample query (1)) will be indicated by a question mark accompanying the appropriate time attribute in the temporal context structure. Similarly, a question concept, present in the conceptual graph parameter of the topic context structure, also identified as having a question mark referent, will signify required nontemporal information (such as ‘history’ in sample query (2)). 4.2.2 Establishing contexts in queries Effectively, a PDS localisation axis will explicate the temporal relationship a sentence has with all other sentences appearing in the PDS. The Iocahsation axis for a query will be that associated with the PDS(s) it targets, which, as yet, is not known. Therefore, a query cannot be localised immediately to a temporal context. However, since it targets specific conceptual PDS information, we can rely on the precision of these ‘targeting concepts’ for establishing the topic context to which it belongs. This is not the case for PDS sentences since concepts may be misleading. Hence, in the case of a query, establishment of its topic context should occur before its temporal context. Tables 2 and 3 exhibit examples of the different types of such queries used in CONIR, namely clinical and statistical. For instance, consider a PDS sentence, ‘She was admitted for a previous catheterisation in January 1992’, which belongs to a history topic context. Using the algorithm for PDS sentence topic context establishment, schema generalisation followed by ‘slotting’ against the admission topic schema, results in establishment of an admission context. This mistake arises due to the presence of the misleading concept ‘admission’ and can be avoided by first establishing a past admission temporal context. On the other hand, consider the query, ‘On what dute was Ann

Table 3 Conceptual context descriptions for sample statistical queries (1) Count the number of patients presently admitted into this hospital and calculate their mean age. Topic context structure: (administration;

[patient]/(attr)_)[sumame]-)(vaO-)[surname:?]% (attr)->[age]-)(val)-)[age: ?I-/>

Temporal context structure: (present admission; null) (2) How many patients were admitted into ward I1 during the month of December 1990? Topic context structure: (administration;

[patient]/(attry)[sumame]->(val)-)[surname:?]% (attr)->[ward]-)(val)-)[ward_ll]-/)

Temporal context structure: (present admission; BT: 1 Dec. 1990, TS: date, ET: 31 Dee 1990, TS: date)

220

B. Nangle, M. T. Deane /Artificial Intelligence in Medicine 6 (1994) 207-227

Jones admitted for cathetetiation. 7’. Unlike the above PDS sentence, we can now rely on the concepts present in the query as ones which indicate the correct topic context, namely, admission. We can then easily determine its temporal context to be present admission (see Fig. 4). In conclusion, the context to which a query belongs can be established via a mechanism, similar to that described for PDS sentences, but one which operates in reverse.

5. Indexing and the knowledge hierarchy in CONIR MENELAS also indexes processed PDSs before storing them with their associated text in a database. Indexing is carried out with respect to a knowledge hierarchy of medical concepts and the hierarchy is annotated with particular PDS IDS. This knowledge hierarchy is a Concept Type Lattice of medical concepts, organised around various medical nomenclature categories and concepts typically found in PDSs, under which PDS identification codes (PDS IDS) are categorised (see Fig. 6). At th e t opmost levels of the hierarchy one has general categories for object, timed and attribute entities. Below this, one has subordinates of these categories (e.g. physical object category, or alphabetic name groups of patients) and instances of these categories (e.g. types of medical tools, diseases, treatments and individual patients). Each of these will further point to IDS of the PDSs in which they occur. It is not always necessary for the IDS of PDSs to be at the bottom of the hierarchy, as shown in Fig. 6, because they constitute annotations on the concepts in hierarchy, rather than being explicit entities grouped in it.

6. The CONIR retrieval algorithm In this section, we will review the algorithm by describing the two steps in the retrieval process: PDS Retrieval and Answer Refinement. Later, we will illustrate the algorithm by showing how it deals with a number of specific examples. 6.1. PD.9 Retrieval This first stage of retrieval in CONIR is really just a first pass at extracting an appropriate answer. It takes as its inputs the conceptual graph of the query and outputs an answer in the form of a set of PDS IDS that are relevant to the query. This candidate set of PDSs is, thus, a coarse reply for which refinement is necessary in order to mould it into a final answer. PDS Retrieval first constructs an index set to summarise the contents of the query. This may be achieved by filtering the core concepts in the representation of the query into three groups: question, direct and indirect-concept groups. Question

B. Nangle, M. T. Keane /Artijicial Intelligence in Medicine 6 (1994) 207-227

221

concepts will be retained for the answer refinement stage. The direct and indirect concepts will be matched against the knowledge hierarchy, thereby finding a set of PDS IDS. Direct concepts (e.g. ‘john’, ‘ ischaemic-pain’) will immediately point to a PDS ID, usually appearing near the bottom of the hierarchy. Indirect concepts, on the other hand, (e.g.‘person’, treatment’) will not point directly to PDS IDS. These appear further up in the hierarchy and have a variety of subtypes separating them from PDS IDS. Only direct concepts will be used as the index set for the query since indirect concepts will be too general; they will occur in a large number (if not all) of the PDSs and hence may pinpoint a huge number of completely irrelevant PDSs. Indirect concepts are only used when a query does not possess any direct concepts.

Object

Tinled

Attribute

PDS 23 might be abour a patient called “John Smith”. on whom angiography has been who has sufferedfrom angina. experienced ischaemic pain and has been treated with Heparin. Fig. 6. The knowledge hierarchy for indexing.

perjmned.

222

B. Nangk, M. T. Keane /Artificial Intelligence in Medicine 6 (1994) 207-227

For example, assume that CONIR receives a natural language query of the form, ‘What is the histoy associated with John Smith?‘. After natural language processing and the establishment of a conceptual context description of the query, it will be represented in the form shown at the top of Fig. 7. Its raw conceptual graph representation will be passed into the PDS retrieval stage, at which point the concept, ‘history’, will be placed in the question-concept group, the concepts, ‘john’ and ‘smith’ will form the direct-concepts group, while ‘firstname’ and ‘surname’ will make up the indirect-concepts group. Therefore, the concepts ‘john’ and ‘smith’ would be extracted for searching purposes. Matching then takes place with the index set, [‘john’,‘smith’l in the knowledge hierarchy (see Fig. 6). Since PDS 23 is the only ID indexed by these direct concepts, a list of this single PDS ID is passed to the answer refinement stage. 6.2 Answer Refinement While the PDS retrieval stage involves relatively simple graph traversal, by comparison the answer refinement stage is much more complicated. This stage takes the identification codes of PDSs found by the first stage and uses these to determine a direct answer to the query which is output to the user. Answer refinement does this by matching the conceptual context descriptions of the query against those associated with this candidate set of PDSs. Two specific sub-steps are carried out in answer refinement: a context satisfiability check and conceptual graph matching. A context satisfiability check is performed on the conceptual context descriptions of each PDS sentence, to see if their conceptual graph representations are enclosed within the same topic and temporal contexts as that of the query. The assumption here is that a PDS sentence should be considered a candidate for answering a query if the query asks about the same topic that the sentence discusses and is localised within the same temporal context as the sentence. Once a PDS sentence satisfies this context check, its conceptual graph representation is matched against that of the query. The conceptual graph matching operation we use is projection [3,18]. Informally, if Q is a query graph and {Sl,. . ., SN} a sequence of conceptual graphs corresponding to a PDS, then the target graph S is said to answer the question Q if there is a conceptual graph, q, called the projection of Q in S (see Table 4). S is a specialisation of Q (i.e. it expresses something more detailed). The concept, [coronary-artery] in the answer sentence, q, is a restriction of the concept [vessel:?] in the question Q. The result of these two sub-steps will be the final answer to the query. It is probably hard to appreciate how powerful this form of matching can be. In this case a power which arises from the constraints imposed by the representations used for queries and PDSs in CONIR. With respect to the example used in the section on PDS retrieval (Section 6.11, let us assume that the following text constitutes a portion of PDS 23, which has been extracted as a first reply to the query in Fig. 7.

B. Nan&,

M. T. Keane /Artificial Intelligence in Medicine 6 (1994) 207-227

TEMPORAL CONTEXT: “past admission”

(attr) -> [firstname] -a (val) -> [johnI% (attr) -> [surname] -> (val) -> [smith]-/.

TEMPORAL PERSPECTIVE: “now”

egg

EMPORAL CONTEXT :“present admission”;

(goal) -> [angioplasty] -a (obj) -> [occlusion] ->(loc.int) ->[lad] 56 (ptnt) -> [man:*m] -> (attr) -a [age:58 @ ‘year’]-/.

1

TEMPORAL CONTEXT:

“pas1admissiin”;BT

Jan 1992,ST date

TOPIC CONTEXT: “history”

[catheterisation] -> (ptnt) -5 [man: *m]

TEMPORAL CONTEXT:

“presentadmission”;BT

22 Jan 1992, ST date

TOPIC CONTEXT “hospitalisatiun” )

[angioplasty] -> (ptnt) -> [man: *ml

Fig. 7. Conceptual

context

description

of sample

text.

223

224

B. Nan&, M. T. Keane /Artificial Intelligence in Medicine 6 (1994) 207-227

Table 4 The projection operation Q: ‘During the procedure performed on Joe Bloggs, in which vessels was it found that the lesion occurred?’ [patient]/(attr)-)[firstname]-)(val)_)boe]% (attr)-)[surname]->(val)->[bloggs]% (ptnt)-)[procedure]-)(rslt)-)[findings]-) (chrc)-)[lesion]-)(loc.int)-)[vessel:?]-/. S: ‘Joe Bloggs una’erwent coronary arteriography which showed a moderate lesion in the right coronary artery’ [patient]/(attr)->[firstname]->(val)->boe]% (attr)->[sumame]-)(val)-)[bloggs]% (ptnt)_)[coronary-arteriography]-)(rslt)-)[findings]-) (chrc)-)[lesion]/(attr)-)[moderate]% (loc.int)-)[coronary-artery]-/. u: ‘Joe Bloggs underwent coronary arteriography which showed a lesion in the right coronary artery’ [patient]/(attr)-)[firstname]->(val)_>~oe]% (attr)->[surname]->(val)-)[bloggs]% (ptntr)[coronary-arteriography]-)(rslt)-)[findings]-) (chrc)->[lesion]-)(loc.int)-)[coronary-artery].

‘ . . . This 58 year old man was admitted on the 21.1.92 for angioplasty to a totally blocked LAD. He had a previous catheterisation in January 1992. He came to angioplasty on the 22.1.92. . . ’

The second conceptual context description in Fig. 7 belongs to this text fragment. During the first stage of answer refinement, the context satisfiability check, the second sentence of PDS 23 is selected as containing an answer to the query. This is because it belongs to the same temporal and topic contexts as the query, namely, ‘past admission’ and ‘history’ respectively. Then, in the graph matching stage, the raw conceptual graph representations of sentence two in PDS 23 and the query are matched. Since the conceptual graphs are not isomorphic, no projection is possible and so a specific, ‘yes/no’ or restricted concept (e.g. ‘catheterisation’) reply is not attainable. Hence the final answer from CONIR is a direct, pertinent and precise answer, namely, the entire PDS 23 narrative with its second sentence highlighted, describing in detail the history associated with the patient. This will be presented to the user by the answer presentation component of MENELAS. 6.3 Meeting the user class requirements CONIR should service the queries of medical and managerial users. In the previous section we dealt with the clinical queries of medical users. CONIR also

B. Nangle, M.T. Keane /Artificial Intelligence in Medicine 6 (1994) 207-227

225

answers queries asked by managerial users. By applying CONIR’s retrieval algorithm, in a similar fashion as above, to the second statistical sample query shown in Table 3, ‘How many patients were admitted into ward 11 during the month of December 1990?‘. The identities of all patients admitted into this ward during this temporal interval, will be retrieved as direct answers. MENELAS will then display a final reply in a desirable form specified by the user: a number, a matrix or a graph. In this case, the user would possibly require a list (i.e. a matrix with one row) of all patient identities as well as the calculated total number of these patients.

7. Future directions The three knowledge structures used during topic context establishment can also form the basis of a multi-layered prediction for PDSs. This idea is based on the METEXA system [13]. At present, PDSs are firstly dictated by a physician and then typed. We could, however, tell the physician the structure of the PDS and hence of the next expected sentence and prompt him/her with the concepts, words and plausible phrases for this sentence. In doing so, the process of PDS generation would be automatic and drastically speeded up. Sentence prediction can be achieved by exploiting the underlying predictive structure of a PDS, described in the PDS schema knowledge structure and by exploiting knowledge of the typical topics (concepts) found in PDSs and the semantic field of these concepts, as can be found in the topic and topic related concepts knowledge structures used for context establishment. Moreover, not only can we predict text to speed up PDS construction, but we could also aid the doctor by predicting possible treatments, diagnoses or the best possible date for review. Furthermore, context based information retrieval need not be constrained to PDSs alone. In fact, technical documents in other medical and non-medical domains, also possess an invariant higher order structure, from which a contextual description of the text can be extracted. As is the case with PDSs, this contextual information can be used for the purpose of achieving effective conceptual retrieval.

8. Conclusions

In this paper, we have introduced a Hospital Information System for Patient Discharge Summaries which we call CONIR (CONtext Driven IR). CONIR achieves flexible, effective and efficient IR for PDSs which far advances the capabilities of paper filing systems or indeed existing medical IR systems. IR in CONIR is flexible because it can cater for both medical and managerial users and allow them to be precise in their queries. Due to the predictive structure and content characteristics of PDS narratives, we have seen how a PDS sentence (or one querying a PDS) can be described in terms of specific temporal and topic contexts. Accordingly, we have introduced a semantic model which conceptually

226

B. Nangk-, M. T. Keane /Artij?cial Intelligence in Medicine 6 (1594) 207-227

describes the query and PDS sentences in terms of the PDS contexts to which they belong and which accounts for the efficiency and effectiveness of IR in CONIR. Deficiency is achieved due to a ‘two-step-answer-reduction’ mechanism and the constraining power of the conceptual context and knowledge structures, which reduce the time required to search for an answer. This search will merely involve finding sentences belonging to the same temporal and topic contexts as the query. Effectiveness is also attained because the retrieved answer is direct (as opposed to a list of documents through which the user must wade) and more importantly relevant and precise. Exploiting the contexts of PDSs not only improves flexibility, efficiency and effectiveness in IR for PDSs, but it also allows us to predict the conceptual structure of the next expected PDS sentence, and its possible concepts, thereby both speeding up PDS generation and aiding the doctor in predicting vital medical information. Finally, context driven retrieval can also be employed for technical documents in domains where the text possesses an invariant structure. CONIR has been implemented in ProLog (ProLog by BIM) and runs on a SUN SparcStation.

References [l] C. Berrut, Indexing medical reports: The RIME approach, Informat. Processing&Management 26 (1) (1990) 93-109. [2] K. Campbell and M. Musen, Representation of clinical data using SNOMED III and conceptual graphs, SCAhfC-Symp. on Computer Applications in Medical Care (1992) 354-358. [3] J. Fargues, Conceptual Graph Information Retrieval using linear resolution, generalisation and graph splitting, Proc. Fourth Annual Workshop on Conceptual Structures (AAAI-89) (J. Nagle, T. Nagle, 19891. [4] J. Fargues, M-C. Landau, A. Dugourd and L. Catach, Conceptual graphs for semantics and knowledge processing, ZBMJ. Res. Development 30 (1) (Jan. 19861. [5] C. Mery, B. Normier and A. Ogonowski, INTERMED: A medical language interface, European Conf on Artificial Intelligence in Medicine (AIME-87) (J. Fox, M. Fieschi, R. Engelbrecht, 19871. [6] B. Moulin (1992), Conceptual graph approach for the representation of temporal information in discourse, Knowledge-based Syst. 5 (3) (Sep. 1992). [7] B. Moulin, D. Rousseau and D. Venderveken, Speech acts in a connected discourse, a computational representation based on conceptual graph theory, Proc. 6th Annual Workshop on Conceptual Structures (E. Way, July 1991). [8] B. Moulin and D. Cot&, Extending the conceptual graph model for differentiating temporal and non-temporal knowledge, Proc. 5th Annual Workshop on Conceptual Structures (P. Eklund, L. Gerholz, July 1990). [9] M. Lyman, N. Sager, E.C. Chi, L.J. Tick, N.T. Nhan, Y. Su, F. Borst, and J-R. Scherrer, Medical language processing for knowledge representation and retrievals, 13th Annual Symp. on Computer Applications in Medical Care &XMC-89) (IEEE, Computer Society Press, Nov. 19891. [lo] C.J. van Rijsbergen, Information Retrieval (Butterworths, 1979). [ll] S.E. Robertson, The probability ranking principle in IR, J. Documentation (33) (4) (Dec. 1977) 294-304. [12] G. Salton and M. McGill, Introduction to Modem Informafion Retrieval (McGraw-Hill, Englewood Cliffs, NJ, 1983). [13] M. Schriider, Supporting speech processing by expectations: A conceptual model of radiological

B. Nan&,

M.T. Keane /Artificial Intelligence in Medicine 6 (1994) 207-227

227

reports to guide the selection of word hypotheses, Konfereru “Verarbeitung natiirlicher Sprache” fK0m~s 92) (1) (G. ~iir2, Oct. 1992). 1141 M. Schrdder, Knowledge based analysis of radiology reports using conceptual graphs, Proc. 7th Annual Workshop on Conceptual Graphs (H.D. Pfeiffer, July 1992). [151 M. Schriider, Knowledge based processing of medical language: A language engineering approach, German Conf: on Arrifciul ZnteUigence (GWAI-92) (H-J. Ohlbach, July 1992) 190-199. [16] A.F. Smeaton, Progress in the application of natural language processing to information retrieval tasks, Comput. J. (35) (3) (1992). [17] A.F. Smeaton and P. Sheridan, The application of morpho-syntactic language processing to effective phrase matching, Informat. Processing & Management (28) (3) (1992) 349-369. [18] J.F. Sowa, Conceptual Structures: Information Processing in Mind and Machine (Addison Wesley, IBM Systems Research Institute, 1984). 1191 F. Volot, P. Zweigenbaum, B. Bachimont, M. Ben Said, J. Bouaud, M. Fieschi and J.F. Boisvieux, Structuration and acquisition of medical knowledge using UMLS in the conceptual graph formalism, Symp. Computer Applications in Medical Care (SCAMC 1993) (Washington DC, Ott-Nov 1993).