ELSEVIER
Artificial Intelligence in Medicine 6 (1994) 263-271
Artificial intelligence in Medicine
Finding temporal patterns - A set-based approach Ted D. Wade *, Patricia J. Byms, John F. Steiner, Jessica Bondy Department of Preventive Medicine and Biometrics and Center for Health Services Research, University of Colorado Health Sciences Center, Denver. CO 80262, USA
(Received August 1993; revised February 1994)
Abstract We created an inference engine and query language for expressing temporal patterns in data. The patterns are represented by using temporally-ordered sets of data objects. Patterns are elaborated by reference to new objects inferred from original data, and by interlocking temporal and other relationships among sets of these objects. We found the tools well-suited to define scenarios of events that are evidence of inappropriate use of prescription drugs, using Medicaid administrative data that describe medical events. The tools’ usefulness in research might be considerably more general. Key words: Temporal pattern; caid; Inference engine; Set
Knowledge
representation;
Drug treatment;
Prolog; Medi-
1. Introduction
In 1988 we began designing tools to define and discover temporal patterns in data. The tools were to be used in a project to create quality review software which would identify inappropriate use of prescription drugs, and find instances of such use in a database derived from Medicaid billing data. We realized that in order to avoid false positives (inferring problems where none existed) we had to be very specific about temporal relationships in the data. For example, two drugs could not interact if they were not taken over the same period of time. If the patient was given tests that would monitor for adverse effects, this would be evidence that the physician was watching for such effects, but only if the tests were ordered at an appropriate time.
* Corresponding
author. Email:
[email protected]
0933-3657/94/$07.00 0 1994 Elsevier Science B.V. All rights reserved SSDI 0933-3657(94)00005-D
264
T.D. Wade et al. /Arti@cial
Intelligence in Medicine 6 (1994) 263-271
Tools for finding temporal patterns were rare when our project started. Relevant studies in artificial intelligence had concentrated on the relatively hard problem of reasoning about hypothetical times, which are known only relative to other events, and which may only be partially ordered [l]. We needed tools which dealt with instantiated times, could be embedded in a conventional expert system, and were efficient enough to allow us to process thousands of case histories each day.
2. Rationale As our design evolved we found three issues that seemed especially important to solve. One was related to the debate about points versus intervals in temporal representation [1,3] - how do we respect our precision of time measurement by not over-stating the precision of our temporal inferences? The second issue was how to specify temporal patterns with a sufficient degree of ‘looseness’ that would allow them to model real patient histories. The third issue was how to implement the temporal ordering that underlies inferences about temporal relationships. 2.1. Precision of measurement, simultaneity, and the chronon Every measuring ‘instrument’ has its own temporal resolution. In our database the claims for medical services have a date of service - the service in question occurred some time during a particular calendar date. For two services that occurred on the same date, we can not in general determine if one was before the other. McKenzie and Snodgrass [7] said that intervals in temporal databases should be built up from the concatenation of primitive, nondecomposable intervals called chronons, and that the most natural duration of the chronon would be the time resolution of the data. Larger chronons might also be used, but ideally these would be a multiple of the duration of the smaller chronon. We adopted the chronon model with a calendar day as its duration. In this way we would always be able to decide the beginning chronon and ending chronon of some clinically-relevant event. Suppose a patient is admitted to a hospital on day M and discharged to a nursing home on day N (where N> M), and that the patient is admitted to the nursing home on day N. Where was the patient on chronon N? We do not want the patient to be in two places at once, yet, sometime on day N, the patient traveled from the hospital to the nursing home. We assume primacy to the admitting date, and say that the patient was in the nursing home on N (all 24-hours worth), but was in the hospital on N - 1. In our database it is unlikely that M = N, but we would have to deal with that possibility by giving chronon N to the hospital, and starting the nursing home stay at N + I. This is as precise a model as our time resolution will allow. With the chronon model simultaneity is also unambiguously defined to the limit of our time resolution. Events overlap in time if and only if they span any of the
T.D. Wade et al. /Artificial
Intelligence in Medicine 6 (1994) 263-271
265
same chronons. A patient may have had a certain procedure on day N, and then been given a particular diagnosis on day N, which would indicate that the procedure was not appropriate. With the day as our smallest chronon, we simply say that the procedure and diagnosis were simultaneous. If we wanted to infer a mistake in treatment we would have to find the diagnosis to be in a chronon prior to the procedure. 2.2. Temporal pa ttem generality Allen [l] originally suggested that the time intervals of a problem be grouped into broader spans called reference intervals to make reasoning about temporal patterns more manageable. Recent work on treatment planning [9,12] structured temporal databases by using context intervals that were known a priori to be expected phases of the treatment plan. This can be a powerful organizing principle. In terms of the search for temporal patterns, however, hierarchical abstraction is a much better objective than it is a means. For example, we could make very few assumptions about how the course of treatment was organized in our patient population. Even if Medicaid regulations only allow filling a certain prescription every 7 days, it would be risky to have the proper operation of your software depend on finding only that interval in your database. Time and again we found that ‘typical’, or even regulated, courses of treatment were too simple to describe our data, and we had to broaden our definitions to capture reality. In any query system the primary task is to reduce the universe of data to something much more specific. The primary logical tool for making a set more specific is to give a conjunction of constraints on the set members. We also found it necessary to have some forms of disjunction to decreasing the specificity of a statement. Because we were working with sets, negation was useful in making more succint descriptions of patterns. We called our loosely constrained temporal patterns scenarios because they could ‘play out’ in multiple ways. 2.3. Using temporal indices When retrieving a temporal object from a database, it is usually sufficient to index the object’s non-temporal attributes by exact value. Temporal attributes, however, are most often needed to satisfy relations based on inequalities, such as ‘next’, ‘during’ or ‘before’. In fact we rarely know beforehand the exact value of time to retrieve. Instead we typically want to find all objects whose time is greater than (or less than) some currently known value. Most databases support finding all values of a key which sort after a given value. One approach to temporal object retrieval would be to maintain such ordered indices for each type of object. We did not choose to do so because many comparisons of times might go across object types, and as a pattern search evolved many new objects would be inferred. As we
266
T.D. Wade et al. /Artijkial
Intelligence in Medicine 6 (1994) 263-271
show in the Methods section, we delayed computing temporal order until it was needed for inferences.
3. Methods 3.1. Object-Oriented Inference Engine (OOIE) and rule language OOIE is written in the Quintus Prolog language, and runs on a VAX/VMS platform. OOIE is an augmented version of a simple object programming system written in Prolog by Stabler [ll]. We chose Prolog because of two concerns: programming flexibility and speed. In the OOIE system a class defines a kind of program object with certain kinds of data attributes. The class has a set of methods, which are procedures that return a value when sent as a message to an object, either to access the attribute values of the object, or to calculate some other property of the object from those values. Other messages are sent to the class, and are used to deduce the existence of an object or set of objects belonging to the class. Methods in OOIE can be written as Prolog predicates or as OOIE rules. Objects can also be stored in an external relational database table. The time-related aspects of the OOIE language are not implemented as specific syntactic forms, but as features of some classes. The critical deductions in our drug use review task are done by rules which describe problem objects - scenarios of potentially incorrect treatment. In our application OOIE starts with the medically relevant facts from a Medicaid recipient’s claims history, which include objects representing demographic information, drug claims, diagnoses and procedures from doctor visits or inpatient stays, and the dates and types of inpatient and long-term-care stays. The program also has access to a database of facts about drugs and providers. It deduces the existence and particulars of any drug hazard problems or inappropriately costly drug use for the case. The deductions are stored on files for use by summary and profiling programs. 3.2. Example rules
Our method for discovering temporal patterns depends in numerous ways on the use of ordered sets as constructs in our OOIE rule language. The basic temporal question is something like: ‘Of all the events of a certain type, do any have a certain temporal relation to another certain event?‘. To get ‘all the events of a certain type’, we create an ordered set of objects that represent those events. The set is ordered by time, so further inferences can take advantage of the fact that, for any event in the set, we know the immediately preceding and immediately following event. The conditions tested in a rule use sets as variables, with various ways to: create sets, iterate through the members of a set, and test logical relations on members of a set.
267
T.D. Wade et al. /Artificial Intelligence in Medicine 6 (1994) 263-271
The following fragment of a rule shows how checking the members of a set for a temporal relation can prove the existence of a temporal pattern. if
...
Condl
is - a
condition
having
CconditC’Antepartum
risk’)
and
spanCPreg_span)) and
...
there
- is - no
and
...
O-treats
condition
having
condit(‘Manic-depressive disease’)
is-the
condition-date
- set
of
‘TCA
or
neuroleptic
trt’ and
any
O-treats
<
>
Preg_span
...
then
...
This rule identifies that lithium (carbonate) treatment is unwarranted during pregnancy unless the patient is manic-depressive - which condition, if untreated, is a worse hazard. Because diagnoses are the least reliable codes in administrative databases such as ours, the pattern of lithium prescribing is used to infer its clinical purpose. When lithium is given with a tricyclic antidepressant (TCA), its use is intended to increase the blood level of the antidepressant. This adjunct use of lithium is not justified in pregnancy. However, when lithium is given alone, its use is probably indicative of manic-depressive disease. Thus lithium is justified during pregnancy (and our problem rule thus fails to ‘fire’), unless the drug is given concurrently with the TCA. This reasoning has a clear and simple implementation in the rule, even though it includes appropriate temporal logic, which is merely implicit in our English explanation. The following rule fragment illustrates several more set-related functions. The rule creates an ‘Antepartum risk’ condition from another condition, ‘Live birth’, which is defined by diagnostic and procedure codes. During testing we found cases which seemed to have invalid live birth codes because the date of those codes did not fall within an inpatient hospital stay. The rule was modified so that it used only that subset of the live birth codes which fell within the span of a hospital stay: (I)
if
Births
is-a
CconditC’Live (2)
and
Stays
condition birth’)
is-all
having and
span(Lspan)
(3)
and
Hstays
is-the
property-subset
(4)
and
Hspans
is-the
property_vals_set
(5)
and
Bdates
is-the
set
(6)
and
Bspan
is-the
and
dates(Ldates))
hasp_stay
of
date-span
(Ldates of
of
(Stays of
(Hstays
<> Bdates
and
posC3)) and
span))
Hspans)
...
Subgoal I binds to Mates those days on which occurred a diagnosis or procedure indicative of live birth. Codes might occur on several consecutive days for a given birth, and more than one birth (for a single patient) could occur in the 14 months covered by our database. Subgoal 2 uses the is_aZl predicate to obtain all the hosp_stuy objects. Subgoal 3 uses a function called property_subset, which takes the set, Stays, and binds to Hstays that subset which have a pos attribute
268
T.D. Wade et al. /Art@cial
Intelligence in Medicine 6 (1994) 263-271
equal to 3 (indicating an inpatient stay). Subgoal 4 uses another function, called property_vals_set, to bind to Hspans the spun attributes of the inpatient stays. The next subgoal has two parts. The function called set binds to Bdates the subset of the Ldate (date of live birth) objects which qualify for the predicate within the parentheses. The predicate, GEduring_uny z+ is a binary relation which tests if its first argument (here, implicitly, each one of the dates in Ldates) meets the test of being ‘during any’ of the spans in its second argument, Hspans. So subgoal 5 implements, in a single line of a rule, a double iteration: each member of Ldates is tested to see if it is during any one of the members of Hspans. Bdates is now the live birth dates which are confirmed by having occurred an inpatient stay. In subgoal 6 the function date-spun binds to Bspan a spun object whose begin-date is the earliest date in Bdates, and whose end-date is the most recent date in Bdates. If Bdates has only date, then begin_dute and end-date are the same. In either case the rest of the subgoals in the rule will fire, creating an ‘Antepartum risk’ object. However, if Bdates is the empty set, meaning that there were no confirmed birth dates, then the date-spun function subgoal is false, so the inference engine will try to backtrack. Since there are no alternative answers to any of the subgoals I through 5, the rule itself will fail to be true; no ‘Antepartum risk’ object will be created. Our last example finds a set by two consecutive temporal subsetting operations, but also applies set logic to several non-temporal attributes. The problem was to find patients who were treated with clozapine in a maintenance fashion (stable dose 2 300 mg/day for extended time periods) but also were seen as outpatients by doctors who did not prescribe the clozapine, during a time of clozapine continuous use. This would be a simpler inference were there not the possibilities of multiple prescribers and breaks in treatment. We handle these multiple possibilities, as usual, by extensive use of sets. Here is the rule, followed by explanations keyed to the subgoal numbers.
if Cl)
. .. Ccond
is-a
condition
having
CconditC’CLozapine and
and
Ccuss
and
ClzCuss
(2)
and
is-all
cus
is-the
Ctreat
having
set
is-a
of
maintenance’)
spanCCspan1)
CdrugnameCclzapn) (Ccuss
treatment
... and
cus_doseCMd))
<>
having
Cspan)
CdrugnameCCLzapn)
and
fill.sCCfills)) and
Clzrxers
(3)
and
is-the
OPviss
and
OPdox
(4)
and
Otherdox
(5)
and
OtherViss
(6)
and
TheViss
(7)
and
Date
is-the
is
property_vals_set
is - all outpatient property_vals_set
of - visit of
(Cfills
COPviss
and and
provid)
is - the set - difference of COPdox and is_all_of OPviss having-any Otherdox is - the set of the first date
COtherViss of
TheViss
<
then
provid)
Clzrxers)
- any>>
for
provid
ClrCuss)
T.D. Wade et al. /Artificial Intelligence in Medicine 6 (1994) 263-271 X is-a
problem
'having
cr'iterion
('PRIMARY
269
CARE
NOTIFICATION') and
occurred
date(Date)
and
...
(1) Three subgoals prove that clozapine maintenance occurred and find ClzCuss
(2)
(3) (4) (5) (6) (7)
equal to the set of all clozapine c&s (continuous use spans - when a drug was continually consumed) having at least dose Md during the maintenance period, Cspan. Two subgoals prove that Clzrxers is all of the doctors (i.e. provid’s) prescribing clozapine. Two subgoals prove that OPdox is the set of provid’s of all outpatient visits. Otherdox is the set of non-clozapine-prescribing doctors. OtherViss is the set of the patient’s visits to the Otherdox. TheViss is visits to Otherdox during any clozapine maintenance cus. This subgoal is true only if TheViss is a non-empty set, i.e. there were visits to Otherdox during clozapine maintenance.
4. Results At this writing our Colorado Medicaid Drug Utilization Review system, construtted using OOIE and the principles in this paper, has been in operation over three years. Each month we review from several hundred to over ten thousand patient data sets for prescribing problems. Processing speed is adequate for operations. For an average patient of around 150 records, processing takes only a few seconds. In fact the database maintenance, rather than the speed of inference, is our biggest operational problem. While physicians on the project staff do not write the rules, they can understand and critique them. This facilitates knowledge base development and its ‘accuracy’ in terms of expert agreement. We have 146 scenario rules and 70 class definitions to provide the rules’ vocabulary. The scenario rules are frequently modified, and we have added several new groups of scenarios over the operational period. When we present cases identified by the computer as having problems, a peer review panel of pharmacists and physicians agrees with the computer an average of 69% of the time, with a range from 45% to 95% for different subsets of the scenario rules. We have also used the identified problems in a randomized trial showing that prescribing behavior could be changed using feedback from our program [2]. 5. Discussion
Our approach to temporal pattern search has clearly been effective in our domain of health-care quality assurance. The tools can also be used to find patterns which can be used as endpoints in research studies. For example we have started a study which uses our database and techniques to compare various scenarios for treatment of otitk media and the outcomes of these scenarios.
270
T.D. Wade et al. /Art@&1
Intelligence in Medicine 6 (1994) 263-271
OOIE has apparent advantages and disadvantages compared with work which was published later. The set-based terminology can appear to be more procedural than declarative when it is compared to approaches that try to use more commonsense language [f&8]. However, OOIE rules may have an expressiveness advantage because more variables and temporal relations can be put into a single rule. OOIE and other approaches which allow multiple types of temporal relations are more expressive than languages which enforce a strict temporal hierarchy by using only includes or during relations [41. OOIE sidesteps the issue of intervals versus points [l] by using chronons, which represent both types of data. Others [3,6] have advocated use of both intervals and points. OOIE implements all of its temporal capability by extending its vocabulary with appropriate classes and methods. This approach gives the most flexibility when building a system for real-life use, but it means that the semantics will evolve over time. For example we use object definitions to build temporal persistence or interpolation over indeterminate intervals. Others [4,5,101 deal with these in the base language. There are uses for representing qualitative time trends (e.g. ‘increasing monotonically over some period’) in quantitative variables [5,8,10,13]. We did not specifically use such features in our application, but we did use recursive estimates of drug dose and duration, showing that numeric analyses can be integrated in our framework. Finally it is clear that medical data have various time scales, and some applications must deal with this [3,5]. We had only one time scale, but the chronon approach inherently can support queries about intervals on different scales [7].
Acknowledgements
The development of the software described herein was supported by the Colorado Department of Social Services and the United States Health Care Financing Administration. Other support came from the Colorado Advanced Software Institute.
References [l] J.F. Allen, Maintaining knowledge about temporal intervals, Commun. ACM 26(11) (1983) 832-843. [2] P.J. Byrns, D.C. Lezotte and J. Bondy, Influencing the cost-effectiveness of prescribing using claims-based information: a randomized trial, in preparation. [3] M.G. Kahn, Modeling time in medical decision-support programs, Med. De&km Making 11 (4) (1991) 249-264. [4] M.G. Kahn, L.M. Fagan and S. Tu, Extensions to the time-oriented database model to support temporal reasoning in medical expert systems, Methods Inform. in Med. 30 (1991) 4-14. f-51 E.T. Keravnou and J. Washbrook, A temporal reasoning framework used in the diagnosis of skeletal displasias, Artificial Intelligence in Med. 2 (1990) 239-265. [6] C. Larizza, A. Moglia and M. Stefanelli, M-HTP: a system for monitoring heart transplant patients, Artificial Intelligence in Med. 4 (1992) 111-126.
T.D. Wade et al. /Artificial Intelligence in Medicine 6 (1994) 263-271
271
[7] L.E. McKenzie and R.T. Snodgrass, Evaluation of relational algebras incorporating the time dimension in databases, ACM Comput. Surveys 23 (1991) 501-543. [8] W.A. Perkins and A. Austin, Adding temporal reasoning to expert-system-building environments, IEEE Expert 5 (1) 23-30. [9] D.W. Rucker, D.J. Maron and E.H. Shortliffe, Temporal representation of clinical algorithms using expert-system and database tools, Comput. Biomed. Res. 23 (3) (1990) 222-239. [lo] Y. Shahar and M.A. Musen, Resume: a temporal-abstraction system for patient monitoring, Comput. Biomed. Res. 26 (1993) 255-273. [ll] E. Stabler, Object-oriented programming in Prolog, AZ &pert (Oct. (1986) 46-57. [12] S.W. Tu, M.G. Kahn, M.A. Musen, J.C. Ferguson, E.H. Shortliffe and L.M. Fagan, Episodic skeletal-plan refinement based on temporal data, Commun. ACM 32 (12) (1989) 1439-1455. [13] B.C. Williams, Doing time: putting qualitative reasoning on firmer ground, Proc. Amer. Assoc. Artificial Intelligence, Philadelphia, PA (1986) 105-113.