Finding temporal patterns — A set-based approach

Finding temporal patterns — A set-based approach

ELSEVIER Artificial Intelligence in Medicine 6 (1994) 263-271 Artificial intelligence in Medicine Finding temporal patterns - A set-based approach ...

634KB Sizes 4 Downloads 73 Views

ELSEVIER

Artificial Intelligence in Medicine 6 (1994) 263-271

Artificial intelligence in Medicine

Finding temporal patterns - A set-based approach Ted D. Wade *, Patricia J. Byms, John F. Steiner, Jessica Bondy Department of Preventive Medicine and Biometrics and Center for Health Services Research, University of Colorado Health Sciences Center, Denver. CO 80262, USA

(Received August 1993; revised February 1994)

Abstract We created an inference engine and query language for expressing temporal patterns in data. The patterns are represented by using temporally-ordered sets of data objects. Patterns are elaborated by reference to new objects inferred from original data, and by interlocking temporal and other relationships among sets of these objects. We found the tools well-suited to define scenarios of events that are evidence of inappropriate use of prescription drugs, using Medicaid administrative data that describe medical events. The tools’ usefulness in research might be considerably more general. Key words: Temporal pattern; caid; Inference engine; Set

Knowledge

representation;

Drug treatment;

Prolog; Medi-

1. Introduction

In 1988 we began designing tools to define and discover temporal patterns in data. The tools were to be used in a project to create quality review software which would identify inappropriate use of prescription drugs, and find instances of such use in a database derived from Medicaid billing data. We realized that in order to avoid false positives (inferring problems where none existed) we had to be very specific about temporal relationships in the data. For example, two drugs could not interact if they were not taken over the same period of time. If the patient was given tests that would monitor for adverse effects, this would be evidence that the physician was watching for such effects, but only if the tests were ordered at an appropriate time.

* Corresponding

author. Email: [email protected]

0933-3657/94/$07.00 0 1994 Elsevier Science B.V. All rights reserved SSDI 0933-3657(94)00005-D

264

T.D. Wade et al. /Arti@cial

Intelligence in Medicine 6 (1994) 263-271

Tools for finding temporal patterns were rare when our project started. Relevant studies in artificial intelligence had concentrated on the relatively hard problem of reasoning about hypothetical times, which are known only relative to other events, and which may only be partially ordered [l]. We needed tools which dealt with instantiated times, could be embedded in a conventional expert system, and were efficient enough to allow us to process thousands of case histories each day.

2. Rationale As our design evolved we found three issues that seemed especially important to solve. One was related to the debate about points versus intervals in temporal representation [1,3] - how do we respect our precision of time measurement by not over-stating the precision of our temporal inferences? The second issue was how to specify temporal patterns with a sufficient degree of ‘looseness’ that would allow them to model real patient histories. The third issue was how to implement the temporal ordering that underlies inferences about temporal relationships. 2.1. Precision of measurement, simultaneity, and the chronon Every measuring ‘instrument’ has its own temporal resolution. In our database the claims for medical services have a date of service - the service in question occurred some time during a particular calendar date. For two services that occurred on the same date, we can not in general determine if one was before the other. McKenzie and Snodgrass [7] said that intervals in temporal databases should be built up from the concatenation of primitive, nondecomposable intervals called chronons, and that the most natural duration of the chronon would be the time resolution of the data. Larger chronons might also be used, but ideally these would be a multiple of the duration of the smaller chronon. We adopted the chronon model with a calendar day as its duration. In this way we would always be able to decide the beginning chronon and ending chronon of some clinically-relevant event. Suppose a patient is admitted to a hospital on day M and discharged to a nursing home on day N (where N> M), and that the patient is admitted to the nursing home on day N. Where was the patient on chronon N? We do not want the patient to be in two places at once, yet, sometime on day N, the patient traveled from the hospital to the nursing home. We assume primacy to the admitting date, and say that the patient was in the nursing home on N (all 24-hours worth), but was in the hospital on N - 1. In our database it is unlikely that M = N, but we would have to deal with that possibility by giving chronon N to the hospital, and starting the nursing home stay at N + I. This is as precise a model as our time resolution will allow. With the chronon model simultaneity is also unambiguously defined to the limit of our time resolution. Events overlap in time if and only if they span any of the

T.D. Wade et al. /Artificial

Intelligence in Medicine 6 (1994) 263-271

265

same chronons. A patient may have had a certain procedure on day N, and then been given a particular diagnosis on day N, which would indicate that the procedure was not appropriate. With the day as our smallest chronon, we simply say that the procedure and diagnosis were simultaneous. If we wanted to infer a mistake in treatment we would have to find the diagnosis to be in a chronon prior to the procedure. 2.2. Temporal pa ttem generality Allen [l] originally suggested that the time intervals of a problem be grouped into broader spans called reference intervals to make reasoning about temporal patterns more manageable. Recent work on treatment planning [9,12] structured temporal databases by using context intervals that were known a priori to be expected phases of the treatment plan. This can be a powerful organizing principle. In terms of the search for temporal patterns, however, hierarchical abstraction is a much better objective than it is a means. For example, we could make very few assumptions about how the course of treatment was organized in our patient population. Even if Medicaid regulations only allow filling a certain prescription every 7 days, it would be risky to have the proper operation of your software depend on finding only that interval in your database. Time and again we found that ‘typical’, or even regulated, courses of treatment were too simple to describe our data, and we had to broaden our definitions to capture reality. In any query system the primary task is to reduce the universe of data to something much more specific. The primary logical tool for making a set more specific is to give a conjunction of constraints on the set members. We also found it necessary to have some forms of disjunction to decreasing the specificity of a statement. Because we were working with sets, negation was useful in making more succint descriptions of patterns. We called our loosely constrained temporal patterns scenarios because they could ‘play out’ in multiple ways. 2.3. Using temporal indices When retrieving a temporal object from a database, it is usually sufficient to index the object’s non-temporal attributes by exact value. Temporal attributes, however, are most often needed to satisfy relations based on inequalities, such as ‘next’, ‘during’ or ‘before’. In fact we rarely know beforehand the exact value of time to retrieve. Instead we typically want to find all objects whose time is greater than (or less than) some currently known value. Most databases support finding all values of a key which sort after a given value. One approach to temporal object retrieval would be to maintain such ordered indices for each type of object. We did not choose to do so because many comparisons of times might go across object types, and as a pattern search evolved many new objects would be inferred. As we

266

T.D. Wade et al. /Artijkial

Intelligence in Medicine 6 (1994) 263-271

show in the Methods section, we delayed computing temporal order until it was needed for inferences.

3. Methods 3.1. Object-Oriented Inference Engine (OOIE) and rule language OOIE is written in the Quintus Prolog language, and runs on a VAX/VMS platform. OOIE is an augmented version of a simple object programming system written in Prolog by Stabler [ll]. We chose Prolog because of two concerns: programming flexibility and speed. In the OOIE system a class defines a kind of program object with certain kinds of data attributes. The class has a set of methods, which are procedures that return a value when sent as a message to an object, either to access the attribute values of the object, or to calculate some other property of the object from those values. Other messages are sent to the class, and are used to deduce the existence of an object or set of objects belonging to the class. Methods in OOIE can be written as Prolog predicates or as OOIE rules. Objects can also be stored in an external relational database table. The time-related aspects of the OOIE language are not implemented as specific syntactic forms, but as features of some classes. The critical deductions in our drug use review task are done by rules which describe problem objects - scenarios of potentially incorrect treatment. In our application OOIE starts with the medically relevant facts from a Medicaid recipient’s claims history, which include objects representing demographic information, drug claims, diagnoses and procedures from doctor visits or inpatient stays, and the dates and types of inpatient and long-term-care stays. The program also has access to a database of facts about drugs and providers. It deduces the existence and particulars of any drug hazard problems or inappropriately costly drug use for the case. The deductions are stored on files for use by summary and profiling programs. 3.2. Example rules

Our method for discovering temporal patterns depends in numerous ways on the use of ordered sets as constructs in our OOIE rule language. The basic temporal question is something like: ‘Of all the events of a certain type, do any have a certain temporal relation to another certain event?‘. To get ‘all the events of a certain type’, we create an ordered set of objects that represent those events. The set is ordered by time, so further inferences can take advantage of the fact that, for any event in the set, we know the immediately preceding and immediately following event. The conditions tested in a rule use sets as variables, with various ways to: create sets, iterate through the members of a set, and test logical relations on members of a set.

267

T.D. Wade et al. /Artificial Intelligence in Medicine 6 (1994) 263-271

The following fragment of a rule shows how checking the members of a set for a temporal relation can prove the existence of a temporal pattern. if

...

Condl

is - a

condition

having

CconditC’Antepartum

risk’)

and

spanCPreg_span)) and

...

there

- is - no

and

...

O-treats

condition

having

condit(‘Manic-depressive disease’)

is-the

condition-date

- set

of

‘TCA

or

neuroleptic

trt’ and

any

O-treats

<>

Preg_span

...

then

...

This rule identifies that lithium (carbonate) treatment is unwarranted during pregnancy unless the patient is manic-depressive - which condition, if untreated, is a worse hazard. Because diagnoses are the least reliable codes in administrative databases such as ours, the pattern of lithium prescribing is used to infer its clinical purpose. When lithium is given with a tricyclic antidepressant (TCA), its use is intended to increase the blood level of the antidepressant. This adjunct use of lithium is not justified in pregnancy. However, when lithium is given alone, its use is probably indicative of manic-depressive disease. Thus lithium is justified during pregnancy (and our problem rule thus fails to ‘fire’), unless the drug is given concurrently with the TCA. This reasoning has a clear and simple implementation in the rule, even though it includes appropriate temporal logic, which is merely implicit in our English explanation. The following rule fragment illustrates several more set-related functions. The rule creates an ‘Antepartum risk’ condition from another condition, ‘Live birth’, which is defined by diagnostic and procedure codes. During testing we found cases which seemed to have invalid live birth codes because the date of those codes did not fall within an inpatient hospital stay. The rule was modified so that it used only that subset of the live birth codes which fell within the span of a hospital stay: (I)

if

Births

is-a

CconditC’Live (2)

and

Stays

condition birth’)

is-all

having and

span(Lspan)

(3)

and

Hstays

is-the

property-subset

(4)

and

Hspans

is-the

property_vals_set

(5)

and

Bdates

is-the

set

(6)

and

Bspan

is-the

and

dates(Ldates))

hasp_stay

of

date-span

(Ldates of

of

(Stays of

(Hstays

<> Bdates

and

posC3)) and

span))

Hspans)

...

Subgoal I binds to Mates those days on which occurred a diagnosis or procedure indicative of live birth. Codes might occur on several consecutive days for a given birth, and more than one birth (for a single patient) could occur in the 14 months covered by our database. Subgoal 2 uses the is_aZl predicate to obtain all the hosp_stuy objects. Subgoal 3 uses a function called property_subset, which takes the set, Stays, and binds to Hstays that subset which have a pos attribute

268

T.D. Wade et al. /Art@cial

Intelligence in Medicine 6 (1994) 263-271

equal to 3 (indicating an inpatient stay). Subgoal 4 uses another function, called property_vals_set, to bind to Hspans the spun attributes of the inpatient stays. The next subgoal has two parts. The function called set binds to Bdates the subset of the Ldate (date of live birth) objects which qualify for the predicate within the parentheses. The predicate, GEduring_uny z+ is a binary relation which tests if its first argument (here, implicitly, each one of the dates in Ldates) meets the test of being ‘during any’ of the spans in its second argument, Hspans. So subgoal 5 implements, in a single line of a rule, a double iteration: each member of Ldates is tested to see if it is during any one of the members of Hspans. Bdates is now the live birth dates which are confirmed by having occurred an inpatient stay. In subgoal 6 the function date-spun binds to Bspan a spun object whose begin-date is the earliest date in Bdates, and whose end-date is the most recent date in Bdates. If Bdates has only date, then begin_dute and end-date are the same. In either case the rest of the subgoals in the rule will fire, creating an ‘Antepartum risk’ object. However, if Bdates is the empty set, meaning that there were no confirmed birth dates, then the date-spun function subgoal is false, so the inference engine will try to backtrack. Since there are no alternative answers to any of the subgoals I through 5, the rule itself will fail to be true; no ‘Antepartum risk’ object will be created. Our last example finds a set by two consecutive temporal subsetting operations, but also applies set logic to several non-temporal attributes. The problem was to find patients who were treated with clozapine in a maintenance fashion (stable dose 2 300 mg/day for extended time periods) but also were seen as outpatients by doctors who did not prescribe the clozapine, during a time of clozapine continuous use. This would be a simpler inference were there not the possibilities of multiple prescribers and breaks in treatment. We handle these multiple possibilities, as usual, by extensive use of sets. Here is the rule, followed by explanations keyed to the subgoal numbers.

if Cl)

. .. Ccond

is-a

condition

having

CconditC’CLozapine and

and

Ccuss

and

ClzCuss

(2)

and

is-all

cus

is-the

Ctreat

having

set

is-a

of

maintenance’)

spanCCspan1)

CdrugnameCclzapn) (Ccuss

treatment

... and

cus_doseCMd))

<>

having

Cspan)

CdrugnameCCLzapn)

and

fill.sCCfills)) and

Clzrxers

(3)

and

is-the

OPviss

and

OPdox

(4)

and

Otherdox

(5)

and

OtherViss

(6)

and

TheViss

(7)

and

Date

is-the

is

property_vals_set

is - all outpatient property_vals_set

of - visit of

(Cfills

COPviss

and and

provid)

is - the set - difference of COPdox and is_all_of OPviss having-any Otherdox is - the set of the first date

COtherViss of

TheViss

<
then

provid)

Clzrxers)

- any>>

for

provid

ClrCuss)

T.D. Wade et al. /Artificial Intelligence in Medicine 6 (1994) 263-271 X is-a

problem

'having

cr'iterion

('PRIMARY

269

CARE

NOTIFICATION') and

occurred

date(Date)

and

...

(1) Three subgoals prove that clozapine maintenance occurred and find ClzCuss

(2)

(3) (4) (5) (6) (7)

equal to the set of all clozapine c&s (continuous use spans - when a drug was continually consumed) having at least dose Md during the maintenance period, Cspan. Two subgoals prove that Clzrxers is all of the doctors (i.e. provid’s) prescribing clozapine. Two subgoals prove that OPdox is the set of provid’s of all outpatient visits. Otherdox is the set of non-clozapine-prescribing doctors. OtherViss is the set of the patient’s visits to the Otherdox. TheViss is visits to Otherdox during any clozapine maintenance cus. This subgoal is true only if TheViss is a non-empty set, i.e. there were visits to Otherdox during clozapine maintenance.

4. Results At this writing our Colorado Medicaid Drug Utilization Review system, construtted using OOIE and the principles in this paper, has been in operation over three years. Each month we review from several hundred to over ten thousand patient data sets for prescribing problems. Processing speed is adequate for operations. For an average patient of around 150 records, processing takes only a few seconds. In fact the database maintenance, rather than the speed of inference, is our biggest operational problem. While physicians on the project staff do not write the rules, they can understand and critique them. This facilitates knowledge base development and its ‘accuracy’ in terms of expert agreement. We have 146 scenario rules and 70 class definitions to provide the rules’ vocabulary. The scenario rules are frequently modified, and we have added several new groups of scenarios over the operational period. When we present cases identified by the computer as having problems, a peer review panel of pharmacists and physicians agrees with the computer an average of 69% of the time, with a range from 45% to 95% for different subsets of the scenario rules. We have also used the identified problems in a randomized trial showing that prescribing behavior could be changed using feedback from our program [2]. 5. Discussion

Our approach to temporal pattern search has clearly been effective in our domain of health-care quality assurance. The tools can also be used to find patterns which can be used as endpoints in research studies. For example we have started a study which uses our database and techniques to compare various scenarios for treatment of otitk media and the outcomes of these scenarios.

270

T.D. Wade et al. /Art@&1

Intelligence in Medicine 6 (1994) 263-271

OOIE has apparent advantages and disadvantages compared with work which was published later. The set-based terminology can appear to be more procedural than declarative when it is compared to approaches that try to use more commonsense language [f&8]. However, OOIE rules may have an expressiveness advantage because more variables and temporal relations can be put into a single rule. OOIE and other approaches which allow multiple types of temporal relations are more expressive than languages which enforce a strict temporal hierarchy by using only includes or during relations [41. OOIE sidesteps the issue of intervals versus points [l] by using chronons, which represent both types of data. Others [3,6] have advocated use of both intervals and points. OOIE implements all of its temporal capability by extending its vocabulary with appropriate classes and methods. This approach gives the most flexibility when building a system for real-life use, but it means that the semantics will evolve over time. For example we use object definitions to build temporal persistence or interpolation over indeterminate intervals. Others [4,5,101 deal with these in the base language. There are uses for representing qualitative time trends (e.g. ‘increasing monotonically over some period’) in quantitative variables [5,8,10,13]. We did not specifically use such features in our application, but we did use recursive estimates of drug dose and duration, showing that numeric analyses can be integrated in our framework. Finally it is clear that medical data have various time scales, and some applications must deal with this [3,5]. We had only one time scale, but the chronon approach inherently can support queries about intervals on different scales [7].

Acknowledgements

The development of the software described herein was supported by the Colorado Department of Social Services and the United States Health Care Financing Administration. Other support came from the Colorado Advanced Software Institute.

References [l] J.F. Allen, Maintaining knowledge about temporal intervals, Commun. ACM 26(11) (1983) 832-843. [2] P.J. Byrns, D.C. Lezotte and J. Bondy, Influencing the cost-effectiveness of prescribing using claims-based information: a randomized trial, in preparation. [3] M.G. Kahn, Modeling time in medical decision-support programs, Med. De&km Making 11 (4) (1991) 249-264. [4] M.G. Kahn, L.M. Fagan and S. Tu, Extensions to the time-oriented database model to support temporal reasoning in medical expert systems, Methods Inform. in Med. 30 (1991) 4-14. f-51 E.T. Keravnou and J. Washbrook, A temporal reasoning framework used in the diagnosis of skeletal displasias, Artificial Intelligence in Med. 2 (1990) 239-265. [6] C. Larizza, A. Moglia and M. Stefanelli, M-HTP: a system for monitoring heart transplant patients, Artificial Intelligence in Med. 4 (1992) 111-126.

T.D. Wade et al. /Artificial Intelligence in Medicine 6 (1994) 263-271

271

[7] L.E. McKenzie and R.T. Snodgrass, Evaluation of relational algebras incorporating the time dimension in databases, ACM Comput. Surveys 23 (1991) 501-543. [8] W.A. Perkins and A. Austin, Adding temporal reasoning to expert-system-building environments, IEEE Expert 5 (1) 23-30. [9] D.W. Rucker, D.J. Maron and E.H. Shortliffe, Temporal representation of clinical algorithms using expert-system and database tools, Comput. Biomed. Res. 23 (3) (1990) 222-239. [lo] Y. Shahar and M.A. Musen, Resume: a temporal-abstraction system for patient monitoring, Comput. Biomed. Res. 26 (1993) 255-273. [ll] E. Stabler, Object-oriented programming in Prolog, AZ &pert (Oct. (1986) 46-57. [12] S.W. Tu, M.G. Kahn, M.A. Musen, J.C. Ferguson, E.H. Shortliffe and L.M. Fagan, Episodic skeletal-plan refinement based on temporal data, Commun. ACM 32 (12) (1989) 1439-1455. [13] B.C. Williams, Doing time: putting qualitative reasoning on firmer ground, Proc. Amer. Assoc. Artificial Intelligence, Philadelphia, PA (1986) 105-113.