Word search performance for diagnoses of equine surgical colics in free-text electronic patient records

Word search performance for diagnoses of equine surgical colics in free-text electronic patient records

Preventive Veterinary Medicine 34 Ž1998. 161–174 Word search performance for diagnoses of equine surgical colics in free-text electronic patient reco...

82KB Sizes 1 Downloads 36 Views

Preventive Veterinary Medicine 34 Ž1998. 161–174

Word search performance for diagnoses of equine surgical colics in free-text electronic patient records Leah Estberg a,) , James T. Case a , Richard F. Walters b, Robert D. Cardiff c , Larry D. Galuppo d a

California Veterinary Diagnostic Laboratory System, UniÕersity of California, DaÕis, P.O. Box 1770, DaÕis, CA, 95617, USA b Department of Computer Science, UniÕersity of California, DaÕis, DaÕis, CA, 95616, USA c Center for Medical Informatics, School of Medicine, UniÕersity of California, DaÕis, DaÕis, CA, 95616, USA d Department of Surgical and Radiological Sciences, School of Veterinary Medicine, UniÕersity of California, DaÕis, DaÕis, CA, 95616, USA Accepted 20 August 1997

Abstract The objectives of the current project were to: Ž1. identify limitations of search sensitivity and positive predictive value ŽPPV. for free-text surgical diagnoses included in electronic patient records maintained at the University of California, Davis, Veterinary Medical Teaching Hospital ŽVMTH., Ž2. develop procedural or programmable recommendations for removing these limitations, and Ž3. provide guidelines for effective search strategies for users performing aggregate searches using the VMTH clinical information system. Search sensitivity corresponds to detection sensitivity Žthe capacity of a search term to ‘identify’ a relevant document. and search PPV indicates the proportion of retrieved documents that are relevant. All horses submitted to the VMTH for a gastrointestinal ŽGI. disorder requiring surgical intervention in 1995 were identified using procedure codes for billing purposes and stored in the electronic patient record. Patient records and surgical reports were reviewed for causes of GI disorders, and variation in naming of these disorders. Key word searches were performed for four GI disorders, and search performance was evaluated by estimating search sensitivity and PPV. Search sensitivity ranged from 33% to 98%, and PPV ranged from 2% to 74%. The procedural recommendation that would likely have the greatest influence on minimizing these search limitations would be more uniform naming of GI disorders. This would free searchers from having to anticipate all of the exact word

)

Corresponding author. Tel.: q1 530 752 7088; fax: q1 530 752 5680; e-mail: [email protected]

0167-5877r98r$19.00 q 1998 Elsevier Science B.V. All rights reserved. PII S 0 1 6 7 - 5 8 7 7 Ž 9 7 . 0 0 0 7 5 - 5

162

L. Estberg et al.r PreÕentiÕe Veterinary Medicine 34 (1998) 161–174

combinations that could be used in the relevant documents, and also minimize retrieval of irrelevant documents. q 1998 Elsevier Science B.V. Keywords: Information retrieval; Data management; Training and education

1. Introduction The University of California, Davis ŽUCD. Veterinary Medical Teaching Hospital ŽVMTH. clinical database was developed to facilitate patient care and management. It consists of free text electronic patient records ŽEPR. in which most of the clinical data are arranged under conventional record headings such as presenting complaint, pertinent history, physical examination, problem list, clinical diagnoses, hospitalization progress notes, and discharge summaries and instructions. Veterinary students are responsible for most of the data entry, and additional data are provided by diagnostic laboratory and procedure Žradiology, ultrasound, etc.. support services, some of which are coded and used for billing purposes. The EPR are immediately accessible from multiple terminals located throughout the hospital by way of uniquely assigned patient and patient encounter identifiers. These features of the VMTH clinical database promote timely information dissemination and provide a centralized and legible data repository for quick and reliable patient record retrieval, update and review. This growing repository of clinical data could serve as a source of information for clinical researchers and epidemiologists ŽSafran et al., 1989; Tierney and McDonald, 1991.. Our objectives were to: Ž1. identify limitations of search sensitivity and PPV for surgical diagnoses in the EPR for equine exploratory celiotomies performed in 1995, Ž2. develop procedural or programmable recommendations for removing these limitations, and Ž3. provide guidelines for effective search strategies.

2. Materials and methods 2.1. Study subjects All horses submitted to the UCD VMTH for a gastrointestinal ŽGI. disorder Žcolic. requiring surgical intervention from January 1, 1995 through December 31, 1995 were identified with a search for the following procedure codes stored in the EPR: standing laparotomy-emergency Žcode 4402., exploratory laparotomy-elective Ž4403., standing laparotomy colic Ž4405., ventral midline colic Ž4420., ventral midline colic-foal Ž4422., and ventral midline colic-euthanized Ž4423.. These codes were also used for billing purposes and considered to be sensitive indicators of surgical subjects. More than one record for a horse could have been retrieved if the horse had undergone multiple surgical procedures during 1995.

L. Estberg et al.r PreÕentiÕe Veterinary Medicine 34 (1998) 161–174

163

The Visit Summary portion of the VMTH EPR Žwhich, in its entirety, may also include diagnostic procedure and laboratory test reports. contains the majority of the clinical data recorded for a patient—mostly in uncoded form—and is organized under the headings: Ø Presenting Complaints Ø Pertinent History Ø Physical Examination Ø Problems Ø MedicalrSurgical Procedures Ø MEDICALrSURGICAL PROCEDURES Žcoded data. Ø DIAGNOSTIC PROCEDURES Žcoded data. Ø Pathologic Diagnosis Ø Clinical Diagnoses Ø Comments Ø Plans and Progress Notes Ø Discharge Summary Ø Discharge Instructions 2.2. Subject classification Causes of GI disorder in the study subjects were classified, whenever possible, from descriptions of major or primary surgical findings provided in the surgical reports, otherwise they were classified from listed Clinical Diagnoses in the Visit Summary. For the purposes of this study, colics were classified into following categories by the first author Žwho read all EPR’s and all available surgical reports.: Ø Intraluminal impaction caused by: Ø enterolithiasis Ø feed or fecal material Ø foreign body Ø fecalith Ø sand Ø unspecified Ø Displacement Ø Volvulus Ø Strangulation caused by: Ø strangulating lipoma Ø entrapment between, strangulation around, and adherence to other bowel Ø internal and external herniation Ø unspecified Ø Miscellaneous causes: Ø peritonitis, colitis, or enteritis Ø intussusception Ø stenosis or intramural obstruction Ø rupture or tear Ø gas or fluid distension

164

L. Estberg et al.r PreÕentiÕe Veterinary Medicine 34 (1998) 161–174

2.3. Search performance eÕaluation Variation in choice of words used to describe the GI surgical findings—as reported in the Clinical Diagnoses section of the Visit Summary—were tallied in order to investigate the degree of diversity in naming colic cases. Discrepancies between findings listed in the Clinical Diagnoses section and findings noted in the surgical record were described. Additionally, terms found in the Visit Summary that were typically used to refer to findings commonly encountered during colic surgeries, yet not related to the surgical findings from the visit in question Žterms that could cause false-positive retrieval., were noted. Retrieval sensitivity and PPV were estimated for several GI disorders in the study population. Search sensitivity is defined as the percentage of all subjects of interest Žcases. successfully retrieved Žwcases retrieved during a searchx % wall relevant cases stored in the databasex. and search PPV as the percentage of all retrieved subjects that are cases Žwcases retrieved during a searchx % wall subjects retrievedx.. Word searches of the EPR stored in the VMTH clinical database can be performed. The searcher can construct queries with multiple words combined with the boolean operators AND, OR and NOT, although searches for words found next to each other Žsearch phrases. are not allowed. An equine surgeon at the VMTH ŽL. Galuppo. was provided with photographs of several GI disorders discernable from reproduced photos Ženterolithiasis, displacement, volvulus, adhesion. and asked to provide search terms he would use for record retrieval. Retrieval performance was estimated separately for searches for cases of these disorders using only the Clinical Diagnoses section of the Visit Summary and for searches using the entire Visit Summary. Retrieval PPV was estimated separately for searches including all equine encounters with VMTH clinicians in 1995 Žinpatient and outpatient. and only those equine encounters including celiotomies in 1995.

3. Results 3.1. Study subjects During 1995, there were 232 hospital visits during which at least one celiotomy was performed. Five surgeries were excluded from the study because three were performed to correct reproductive-tract disorders, one was performed to correct an esophageal stenosis, and one patient died shortly following induction of anesthesia. A total of 227 hospital visits were included in the study, and two distinct surgical procedures were performed during nine of the encounters. Therefore, findings from a total of 236 surgical procedures were included in the study. Major surgical findings described within each of 155 surgical reports Ž81 surgical reports could not be located. were contrasted with the final Clinical Diagnoses list in each EPR. A total of 246 surgical findings were listed as clinical diagnoses Ž166 of which could be confirmed with surgical report descriptions., and 18 major surgical findings were extracted from surgical reports Žyet not listed as a clinical diagnosis., giving a total of 264 surgical findings included in the study.

L. Estberg et al.r PreÕentiÕe Veterinary Medicine 34 (1998) 161–174

165

3.2. Variation in wording of clinical diagnoses Diagnoses were typically named based on the location and sometimes the extent, degree, direction, etc. of the GI disorder. Terms and phrases that specifically referred to the primary disorder Žexcluding descriptors referring to location, extent, etc.. and that were likely to be used as search terms Žbased on clinical judgement. were extracted from the Clinical Diagnoses section to describe the variations in wordings of major surgical

Table 1 Words and phrases extracted from clinical diagnoses—and stored in electronic patient records ŽEPR. —describing intraluminal obstruction in surgical colics seen at the University of California, Davis, Veterinary Medical Teaching Hospital ŽUCD VMTH. in 1995 Intraluminal impaction Term variations

No. of instances

Enterolithiasis enterolith enterolith impaction enterolith obstruction enteroliths enterolithiasis enterolithiasis, stones rocks

25 2 1 13 23 1 1

Feed or fecal material fecal impaction feed distended feed impaction impaction colic

2 1 1 3 1

Foreign body foreign body foreign body obstruction

2 2

Fecalith fecalith fecalith obstruction impaction fecalith impaction, obstruction fecalith fecaliths fecolith fecolith obstruction

13 1 1 1 1 5 1

Sand sand sand impaction

4 8

Unspecified impaction obstruction

12 1

166

L. Estberg et al.r PreÕentiÕe Veterinary Medicine 34 (1998) 161–174

Table 2 Words and phrases extracted from clinical diagnoses—and stored in EPR—describing displacement in surgical colics seen at the UCD VMTH in 1995 Displacement Term variations

No. of instances

displacement displaced, displacement displaced displacement, retroflexion displacement, retroflexed retroversion open

23 1 5 1 1 1 1

findings ŽTables 1–5.. ŽTerms in the summary tables separated by commas include variations of expression for a single concept found within the same diagnoses.. 3.2.1. Intraluminal impaction (Table 1) Even though most of the cases of enterolithiasis involved complete obstruction of the GI tract, specific reference to an impaction or obstruction in the clinical diagnosis was rarely made. For the cases of obstruction caused by feed or fecal material, explicit reference was often made to an impaction or distension. This is useful information for the searcher, because searching clinical documents solely with non-specific terms such as ‘feed’ or ‘fecal’ would likely yield low PPV retrievals. Other possibly useful search terms found within the remainder of the text of the EPR to refer to impaction by feed material included ‘dry manure’ and ‘dry ingesta’. ‘Fecalith’ was commonly misspelled, and was rarely specified with terms such as impaction or obstruction. Other possibly useful search terms found within the remainder of the text of the EPR to refer to impaction, surgery to correct impaction, or prevention strategies included ‘impacted’, ‘impactions’, ‘obstructions’, ‘mass of’, ‘firmly filled’, ‘distended’, ‘re-impaction’, ‘reimpaction’, ‘re-obstruct’, ‘reobstruct’, ‘re-obstruction’, and ‘enterolithotomies’. 3.2.2. Displacement (Table 2) There were seven instances in which reference was made to an intestinal displacement or retroflexion in the surgical report, yet no description of displacement was found in the Clinical Diagnoses section. Five of these cases involved a cranial displacement or retroflexion of the pelvic flexure. Other possibly useful search terms found within the remainder of the text of the EPR to refer to displacement included the terms ‘twisted over’ and ‘malpositioned’. 3.2.3. VolÕulus (Table 3) There were four instances in which reference was made to a volvulus within descriptive reports of the surgery, yet no description of volvulus was found in the Clinical Diagnoses section. Other possibly useful search terms found within the remainder of the text of the EPR to refer to cases of volvulus or surgery to correct volvulus

L. Estberg et al.r PreÕentiÕe Veterinary Medicine 34 (1998) 161–174

167

Table 3 Words and phrases extracted from clinical diagnoses—and stored in EPR—describing volvulus in surgical colics seen at the UCD VMTH in 1995 Volvulus Term variations

No. of instances

torsed torsion torsion, rotation volvulus

1 36 1 12

Table 4 Words and phrases extracted from clinical diagnoses—and stored in EPR—describing strangulation in surgical colics seen at the UCD VMTH in 1995 Strangulation Term variations

No. of instances

Lipoma strangulation lipoma strangulation lipomas stangulating sp4 lipoma strangulating lipomas strangulated lipomas strangulating lipoma, incarcerated strangulating obstruction lipoma open

8 1 1 1 1 1 1 1

Adhesions adhesion adhesions strangulation adhesion, band

1 1 1

Unspecified strangulation entrapment

2 1

Internal and External Herniation strangulation mesenteric rent mesenteric rent incarceration strangulated rent gastrosplenic ligament entrapment incarceration rent gastrosplenic ligament mesodiverticular fold mesodiverticular band gastrosplenic entrapment diaphragmatic hernia epiploic entrapment umbilical hernia scrotal hernia testicular torsion entrapment inguinal region evisceration

2 1 1 1 1 1 1 1 2 1 1 1 1 1

168

L. Estberg et al.r PreÕentiÕe Veterinary Medicine 34 (1998) 161–174

Table 5 Words and phrases extracted from clinical diagnoses – – and stored in EPR – – describing miscellaneous findings from surgical colics seen at the UCD VMTH in 1995 Miscellaneous term variations

No. of instances

term variations

infarction

1

Intramural Obstruction

No. of instances

compromise necrotic gastric ulceration ileal diverticula lipomas ileus hematoma mesentery root

2 2 1 1 1 1 1

obstruction mass scarring stenosis stricture open Tearr Rupture

1 2 1 1 1 1

abdominal hemorrhage open Peritonitisr Colitisr Enteritis

1 2

tear rupture ruptured

2 2 4

peritonitis colitis

3 4

perforation Gasr Fluid distension

1

enteritis Intussusception

6

gas distention gaseous distension

3 1

colic

1

fluid distension

2

included ‘twist’, ‘twisted’, ‘detorsed’, and ‘retorsion’. There would be potential for some confusion between the term ‘twisted’ to indicate volvulus and the phrase ‘twisted over’ to indicate displacement. 3.2.4. Strangulation (Table 4) There were five cases of strangulation described in surgical reports not mentioned as clinical diagnoses. ‘Strangulation’ was used more commonly in circumstances that involved lipomas and ‘incarceration’, ‘entrapment’ and ‘hernia’ were used more commonly to describe internal and external herniation. 3.2.5. Miscellaneous (Table 5) These cases included focal infarction compatible with parasitic thromboembolic colic, non-strangulating lipomas, abdominal hemorrhage of unknown origin, and two clinical diagnoses that remained open Žs‘unknown’.. 3.3. Other patient record sections containing potentially releÕant or misleading information In summary, of a total of 264 ‘major’ surgical findings, 239 were recorded as clinical diagnoses, seven recorded as either colic or open, and 18 not listed as clinical diagnoses.

L. Estberg et al.r PreÕentiÕe Veterinary Medicine 34 (1998) 161–174

169

Table 6 Number of instances in which correct reference to a gastrointestinal disorder diagnosed at surgery was made in specified sections of 227 UCD VMTH electronic patient records, in addition to instances of misleading entries section of electronic patient record

Presenting Complaints Pertinent History Physical Examination Problem List MedicalrSurgical Procedures PATHOLOGICAL DIAGNOSES Comments Plans and Progress Notes Discharge Summary Discharge Instructions

No. of instances correct reference

misleading reference

15 31 55 46 34 16 85 12 27 35

7 22 31 4 3 3 31 8 3 6

In addition to the 239 clinical diagnoses listed, terms that could either facilitate or hinder word searches were found within the remaining sections Žother than the Clinical Diagnoses section. of the 227 Visit Summaries. A summary of instances in which surgical findings were correctly recorded or instances of potentially misleading entries made in the various sections is provided in Table 6. For example, while there were instances in which surgical findings were correctly predicted and expressed in the Presenting Complaints entry, there were several instances in which one could be led astray by an incorrect presenting complaint entry such as ‘twisted colon’ listed for a horse with enterolithiasis ultimately discovered at surgery. In some instances, the surgical findings were correctly reflected in Pertinent History entries, especially when a horse experienced a recurrence of a past problem such as ‘horse has had two colic surgeries over the past 10 years’, both caused by ‘enteroliths’ or ‘passed enterolith one year ago’. Misleading phrases found in the Pertinent History section included ‘strangulating lipoma removed . . . eight days ago’ or ‘owner has lost horses to sand and enteroliths in the past’. Currently, in the VMTH clinical database, there is no method available to effectively construct queries that exclude phrases including terms such as ‘no’, ‘none’, or ‘absent’. Misleading phrases found in the Problem List included ‘suspected . . . obstruction’ or ‘possibly sand colic’. Instances in which surgical findings were directly or inadvertently correctly listed in the Comments section often included differential, rule-out, and working diagnoses lists such as ‘rule outs . . . rfeedr . . . impaction’ or ‘may be Žan. . . . enterolith’ in addition to phrases such as ‘ . . . avoid re-impaction’ or ‘decreased appetite could be due to retorsion’. An instance of a misleading phrase found in the Comments section was, ‘ vital signs . . . not poor enough to indicate torsion or volvulus’. 3.4. RetrieÕal performance Retrieval performance was estimated for record searches with the terms enterolithiasis, displacement, torsion or volvulus, and adhesions ŽTable 7.. Search sensitivity ranged from 33% to 98% while search specificity ranged from 91% to 100%. It was possible

L. Estberg et al.r PreÕentiÕe Veterinary Medicine 34 (1998) 161–174

170

Table 7 Success of electronic patient record word searches for 4 gastrointestinal disorders, quantified by search sensitivity, specificity, and positive predictive value ŽPPV, measured separately for searches of all 1995 equine visits and 1995 equine surgical visits only., diagnosed during surgery in horses seen at the UCD VMTH in 1995 electronic patient record section searched

Clinical Diagnoses search sensitivity search specificity search PPV 1995 equine surgical visits only search PPV all 1995 equine visits all sections search sensitivity search specificity search PPV 1995 surgical visits only search PPV all 1995 equine visits

search terms enterolithiasis

displacement

either torsion or volvulus

adhesions

0.38 Ž24r63. 1.00 Ž164r164. 1.00 Ž24r24. 0.67 Ž24r36.

0.61 Ž22r36. 1.00 Ž191r191. 1.00 Ž22r22. 0.56 Ž22r39.

0.90 Ž46r51. 0.99 Ž175r176. 0.98 Ž46r47. 0.74 Ž46r62.

0.33 Ž1r3. 1.00 Ž223r224. 0.50 Ž1r2. 0.07 Ž1r14.

0.41 Ž26r63. 1.00 Ž164r164. 1.00 Ž26r26. 0.58 Ž26r45.

0.72 Ž26r36. 0.94 Ž180r191. 0.70 Ž26r36. 0.26 Ž26r98.

0.98 Ž50r51. 0.91 Ž160r176. 0.76 Ž50r66. 0.54 Ž50r92.

0.33 Ž1r3. 0.95 Ž213r224. 0.08 Ž1r12. 0.02 Ž1r56.

that search sensitivity was overestimated in some instances due to the fact that there were 80 surgeries in which the diagnoses could not be confirmed with surgical reports. Despite some high search sensitivities and consistently high search specificities, search PPV suffered dramatically when searching large population frames Žall 1995 equine encounters. in which the prevalence of the clinical diagnoses of interest were quite low. There were a total of 4685 equine encounters in 1995, of which 227 included abdominal surgery; prevalence of the above four abdominal disorders diagnosed at surgery ranged from 0.06% to 28% in these two groups. There wasn’t consistent evidence of improvement in retrieval performance by including either other record sections in addition to the Clinical Diagnoses section, or the surgical report in the search. When search sensitivity improved, search specificity generally worsened Žmainly due to inaccurate presenting complaints, unrelated historical problems, and differential diagnoses and ruled-out conditional lists..

4. Discussion Search sensitivity Žrecall. and PPV Žprecision. are values commonly provided for retrieval performance evaluation ŽHersh, 1996, pp. 35–57.. Together, search sensitivity

L. Estberg et al.r PreÕentiÕe Veterinary Medicine 34 (1998) 161–174

171

and specificity define the capacity of the search term to distinguish relevant documents from non-relevant documents, and are incorporated in the notion of the ‘selective power’ of a word described by Blois Ž1986. Žpp. 194–222.. Search PPV is a function both of search sensitivity and specificity and of the proportion of relevant documents in the search frame Žprevalence.. As a consequence-while low PPV can be a problem when searching large free-text databases ŽBlair and Maron, 1985. —this search performance characteristic can be improved simply by limiting the search set to the documents most likely to be relevant. An example of this was demonstrated in the current study by limiting the search set to only those 1995 equine hospital visits that included abdominal surgery. Poor search PPV can be a serious problem because the searcher can quickly become overwhelmed when presented with a large number of irrelevant documents. In contrast, good search PPV is meaningless unless search performance can meet some minimum sensitivity. Attempts at adding search terms in order to limit retrieval to those documents most likely to be relevant will usually diminish sensitivity. The results from our study may not reflect the average performance of other systems and system users, and Blair and Maron Ž1985. reported average search sensitivity on a large free-text database near the low end of the range we found. Our results suggest that searches of free-text EPR are susceptible to inaccuracies for a variety of reasons including deficiencies in data entry and user experience and training. At worst, we were only able to identify one-third of the relevant cases and found that for every 100 records retrieved, only two were relevant cases. Reasons for unpredictable retrieval performance included: Ø synonymous expression of GI disorders Ø derivational variation in expression such as enterolith™ enterolithiasis Ø inflectional variation in expression including singular or plural, or past, present, or future tense Žfor conditions to be prevented. word forms Ø surgical findings not recorded in the record Ø unrelated historical problems noted in the record Ø differential diagnoses and ruled-out Žnegative. condition lists Ø misspellings. While rich and flexible expression is possible with documentation in ‘natural language’, it can also result in data entry at varying levels of depth, detail, and precision. Aronson Ž1996. reported that derivational variation was more important to the degradation of concept-based search PPV compared to abbreviations, acronyms, and synonyms —after misspellings and inflectional variation had been accounted for. Presumably, documentation appropriate to support patient care—and not word and phrase choices that would improve document retrieval—was the primary concern of the people who entered the data in the patient records at the VMTH. Free-text retrieval can be difficult because it is based on the assumption that it is a simple matter for searchers to imagine and anticipate all of the exact words and phrases that could be used in the relevant documents, and only in those documents. This can be especially difficult when explicit terms such as ‘impaction’ or ‘obstruction’ are used infrequently in cases of impaction compared to expressions such as ‘enterolithiasis’, ‘sand colic’, or ‘fecalith’. Ideally, an effective strategy for those performing a key-word

172

L. Estberg et al.r PreÕentiÕe Veterinary Medicine 34 (1998) 161–174

search of free-text information systems would include the formulation of a complete list of all possible term variations, abbreviations, acronyms and likely misspellings of terms used to express the condition of interest. This list formulation may be supported by an iterative process of searching and evaluating the search results to discover terms not previously imagined. Improving user facility with search features such as the ‘OR’ operator and wild-card characters can enhance sensitivity by including word form and spelling variations Žalthough this approach may not cover abbreviations and acronyms and may degrade search PPV.. Optimally, one would like to identify terms with both high sensitivity and specificity, although there may be cases in which all possible search terms have either poor sensitivity or specificity. For example, a searcher interested in finding cases of ‘feed impaction’ would probably recognize a priori that each separate term ‘feed’ and ‘impaction’ has poor specificity, but would not know that Žaccording to our results. ‘feed impaction’ as a single phrase has poor sensitivity. Therefore, most of the burden is placed on searchers of free-text databases, who may not receive feedback Žsuch as retrieved document ranking from highest to lowest relevance as appraised by statistical techniques; Hersh, 1996, pp. 133–163. regarding the effectiveness of their search strategies. In general, search sensitivity can be improved by searching the entire EPR Visit Summary—although PPV will always suffer compared to searching only the Clinical Diagnoses section. User training and experience will also influence search success. The procedural recommendation that would likely have the greatest influence on minimizing the search limitations we found would be to standardize the terminology used to name GI disorders ŽAmatayakul et al., 1994; Cimino, 1996a.. Even broad categorization of GI disorders should vastly improve both search sensitivity and PPV. Search accuracy can be improved with the implementation of controlled medical vocabularies and enforcement of structured data entry ŽBoard of Directors of the American Medical Informatics Association, 1994; Cimino et al., 1994.. Search PPV was dramatically improved in the current study by simply taking advantage of the standard procedure codes stored in the VMTH EPR Ži.e. limiting the search frame to only surgical colic cases.. Standardized data entry in combination with free-text entry offers the greatest advantages to the clinical practitioner and researcher ŽSafran, 1991; Zelingher et al., 1995.. User training and education which emphasized the benefits of effective data retrieval would be important to overcome practitioner resistance to using a structured, standardized data-entry approach. Users also should be educated about the potential importance and advantages of this process to improving clinical care. Facilitating access to the patient record database can provide clinicians and students the ability to perform case studies easily, to model the natural progression of disease, and to answer questions immediately relevant to patient care ŽSafran et al., 1989; Paty et al., 1994.; to provide clients with relevant estimates of prognoses ŽPryor and Lee, 1991.; to perform outcomes studies and other clinical epidemiologic studies ŽPayne et al., 1990; Hlatky, 1991; Einbinder et al., 1995.; to perform case surveillance in order to monitor disease trends, hospital-acquired infection rates, and adverse drug events ŽEvans et al., 1993; Rocha et al., 1994.; to study variations in complication rates by patient age and type of procedure,

L. Estberg et al.r PreÕentiÕe Veterinary Medicine 34 (1998) 161–174

173

and re-operation and re-hospitalization rates ŽDeyo et al., 1994.; and to investigate the value and performance of various diagnostic tests ŽNewman et al., 1994.. Additionally, standardized terminology can facilitate the exchange of data between clinical databases and decision-support systems or systems that automate bibliographic retrievals relevant to patient problems ŽMusen, 1992; Cimino, 1996b.. Improving access to patient records will also undoubtedly lead to development of new opportunities and novel approaches to the use of clinical data ŽPryor and Lee, 1991; Petri and Urquhart, 1991; Kahn, 1993.. For instance, teaching hospitals have special responsibilities related to educating students and practitioners continuing their training, and an experience described by a veterinary student to one of the authors was especially notable to us. This student was scheduled to assist with a difficult surgical procedure and was frustrated because this particular procedure was poorly described in the surgical textbooks. Therefore, the student attempted to locate related surgical reports with a search for the relevant medical condition in the VMTH clinical database in order to learn from descriptions provided by past students, but was unsuccessful. It’s impossible to know if her search failed because there were no relevant cases stored in the database or her search strategy was inadequate—but this student may not be likely to try a similar database search in the future.

5. Conclusion Successful patient aggregation by clinical diagnoses with boolean queries of wordbased free-text clinical information systems is largely influenced by the degree of variation in diagnoses description, level of detail provided, and familiarity with the local clinical vocabulary. Our results demonstrated that successful identification of relevant cases and exclusion of irrelevant subjects can vary greatly. The ability to answer clinically relevant questions and perform clinical research with observational patient data stored in a free-text database depends on the data quality and consistency and user facility with the search system.

Acknowledgements This study was made possible with financial support in the form of a Medical Informatics Fellowship training grant provided by the Center for Medical Informatics, University of California, Davis, Medical Center. We are grateful to Paul Brentson and James Self for providing insight and access to the University of California, Davis, Veterinary Medical Teaching Hospital clinical information system.

References Amatayakul, M., Heller, E.E., Johnson, G., 1994. A business case for health informatics standards. In: Proceedings of the 17th Annual Symposium on Computing Applications in Medical Care, pp. 491–495.

174

L. Estberg et al.r PreÕentiÕe Veterinary Medicine 34 (1998) 161–174

Aronson, A.R., 1996. The effect of textual variation on concept based information retrieval. In: Proceedings of the 1996 American Medical Informatics Association Annual Fall Symposium, pp. 373–377. Blair, D.C., Maron, M.E., 1985. An evaluation of retrieval effectiveness for a full-text document retrieval system. Commun. ACM 28, 289–299. Blois, M.S., 1986. Information and Medicine: The Nature of Medical Descriptions. Univ. of California Press, Berkeley, pp. 35–57, 194–222. Board of Directors of the American Medical Informatics Association, 1994. Standards for medical identifiers, codes, and messages needed to create an efficient computer-stored medical record. J. Am. Med. Informatics Assoc. 1, 1–7. Cimino, J.J., 1996a. Review paper: coding systems in health care. Meth. Inform. Med. 35, 273–284. Cimino, J.J., 1996b. Linking patient information systems to bibliographic resources. Meth. Inform. Med. 35, 122–126. Cimino, J.J., Clayton, P.D., Hripcsak, G., Johnson, S.B., 1994. Knowledge-based approaches to maintenance of a large controlled medical terminology. J. Am. Med. Informatics Assoc. 1, 35–50. Deyo, R.A., Taylor, V.M., Diehr, P., Conrad, D., Cherkin, D.C., Ciol, M., Kreuter, W., 1994. Analysis of automated administrative and survey databases to study patterns and outcomes of care. Spine 19 ŽSuppl.., 2083S–2091S. Einbinder, J.S., Rury, C., Safran, C., 1995. Outcomes research using the electronic patient record: Beth Israel Hospital’s experience with autocoagulation. In: Proceedings 19th Annual Symposium on Computer Applications in Medical Care, pp. 819–823. Evans, R.S., Classen, D.C., Stevens, L.E., Pestotnik, S.L., Gardner, R.M., Lloyd, J.F., Burke, J.P., 1993. Using a hospital information system to assess the effects of adverse drug events. In: Proceedings 17th Annual Symposium on Computer Applications in Medical Care, pp. 161–165. Hersh, W.R., 1996. Information Retrieval: A Health Care Perspective. Springer-Verlag, New York. Hlatky, M.A., 1991. Using databases to evaluate therapy. Stat. Med. 10, 647–652. Kahn, M.G., 1993. The desktop database dilemma. Acad. Med. 68, 34–37. Musen, M.A., 1992. Dimensions of knowledge sharing and reuse. Comp. Biomed. Res. 25, 435–467. Newman, T.B., Brown, A., Easterling, M.J., 1994. Obstacles and approaches to clinical database research: experience at the University of California, San Francisco. In: Proceedings of the 18th Annual Symposium on Computer Applications in Medical Care, pp. 568–572. Paty, D., Studney, D., Redekop, K., Lublin, F., 1994. MS COSTAR: A computerized patient record adapted for clinical research purposes. Ann. Neurol. 36 ŽSuppl., 134S–135S. Payne, T.H., Goroll, A.H., Morgan, M., Barnett, G.O., 1990. Conducting a matched-pairs historical cohort study with a computer-based ambulatory medical record system. Comp. Biomed. Res. 23, 455–472. Petri, H., Urquhart, J., 1991. Channeling bias in the interpretation of drug effects. Stat. Med. 10, 577–581. Pryor, D.B., Lee, K.L., 1991. Methods for the analysis and assessment of clinical databases: the clinician’s perspective. Stat. Med. 10, 617–628. Rocha, B.H.S.C., Christenson, J.C., Pavia, A., Evan, R.S., Gardner, R.M., 1994. Computerized detection of nosocomial infections in newborns. In: Proceedings of the 18th Annual Symposium on Computer Applications in Medical Care, pp. 684–688. Safran, C., Porter, D., Lightfoot, J., Rury, C.D., Underhill, L.H., Bleich, H.L., Slack, W.V., 1989. ClinQuery: a system for online searching of data in a teaching hospital. Ann. Internal Med. 111, 751–756. Safran, C., 1991. Using routinely collected data for clinical research. Stat. Med. 10, 559–564. Tierney, W.M., McDonald, C.J., 1991. Practice databases and their uses in clinical research. Stat. Med. 10, 541–557. Zelingher, J., Rind, D.M., Caraballo, E., Tuttle, M.S., Olson, N.E., Safran, C., 1995. Categorization of free-text problem lists: an effective method of capturing clinical data. In: Proceedings 19th Annual Symposium on Computer Applications in Medical Care, pp. 416–420.