G Model
ARTICLE IN PRESS
IJB-3238; No. of Pages 8
International Journal of Medical Informatics xxx (2015) xxx–xxx
Contents lists available at ScienceDirect
International Journal of Medical Informatics journal homepage: www.ijmijournal.com
Using natural language processing to identify problem usage of prescription opioids David S. Carrell a,∗ , David Cronkite a , Roy E. Palmer b , Kathleen Saunders a , David E. Gross b , Elizabeth T. Masters c , Timothy R. Hylan b , Michael Von Korff a a
Group Health Research Institute, 1700 Minor Ave., Suite 1500, Seattle, WA 98101, United States North America Medical Affairs, Global Innovative Pharma, Pfizer, Inc., New York, NY, United States c Outcomes & Evidence, Global Health & Value, Pfizer Inc., New York, New York, United States b
a r t i c l e
i n f o
Article history: Received 17 March 2015 Received in revised form 7 August 2015 Accepted 11 September 2015 Available online xxx Keywords: Natural language processing Computer-assisted records review Opioid-related disorders Surveillance
a b s t r a c t Background: Accurate and scalable surveillance methods are critical to understand widespread problems associated with misuse and abuse of prescription opioids and for implementing effective prevention and control measures. Traditional diagnostic coding incompletely documents problem use. Relevant information for each patient is often obscured in vast amounts of clinical text. Objectives: We developed and evaluated a method that combines natural language processing (NLP) and computer-assisted manual review of clinical notes to identify evidence of problem opioid use in electronic health records (EHRs). Methods: We used the EHR data and text of 22,142 patients receiving chronic opioid therapy (≥70 days’ supply of opioids per calendar quarter) during 2006–2012 to develop and evaluate an NLP-based surveillance method and compare it to traditional methods based on International Classification of Disease, Ninth Edition (ICD-9) codes. We developed a 1288-term dictionary for clinician mentions of opioid addiction, abuse, misuse or overuse, and an NLP system to identify these mentions in unstructured text. The system distinguished affirmative mentions from those that were negated or otherwise qualified. We applied this system to 7336,445 electronic chart notes of the 22,142 patients. Trained abstractors using a custom computer-assisted software interface manually reviewed 7751 chart notes (from 3156 patients) selected by the NLP system and classified each note as to whether or not it contained textual evidence of problem opioid use. Results: Traditional diagnostic codes for problem opioid use were found for 2240 (10.1%) patients. NLPassisted manual review identified an additional 728 (3.1%) patients with evidence of clinically diagnosed problem opioid use in clinical notes. Inter-rater reliability among pairs of abstractors reviewing notes was high, with kappa = 0.86 and 97% agreement for one pair, and kappa = 0.71 and 88% agreement for another pair. Conclusions: Scalable, semi-automated NLP methods can efficiently and accurately identify evidence of problem opioid use in vast amounts of EHR text. Incorporating such methods into surveillance efforts may increase prevalence estimates by as much as one-third relative to traditional methods. © 2015 Published by Elsevier Ireland Ltd.
1. Introduction Prescriptions for opioid analgesics in the United States tripled from 1999 to 2011 [1]. The tripling of opioid prescriptions after 1999 was accompanied by a tripling of opioid-related deaths in the same period [2]. In 2011, the White House Office of National Drug
∗ Corresponding author at: 1730 Minor Ave, Suite 1600, Seattle, WA 98101, United States. E-mail address:
[email protected] (D. Cronkite).
Control Policy declared an epidemic of prescription drug misuse and overdose [3–6]. Accurate and scalable surveillance methods are critical to the implementation of effective prevention and control measures [7] as well as the evaluation of interventions [8]. Estimates of rates of problem prescription opioid use that rely on different types of data from diverse settings have yielded varying results. Assessments based on urine drug screening results, patient surveys, diagnosis codes from electronic health records (EHRs), and electronic pharmacy records [9–13] have produced estimates ranging from 3 to 40% [13]. Automated surveillance methods based on secondary use of EHR data may offer advantages in terms of cost,
http://dx.doi.org/10.1016/j.ijmedinf.2015.09.002 1386-5056/© 2015 Published by Elsevier Ireland Ltd.
Please cite this article in press as: D.S. Carrell, et al., Using natural language processing to identify problem usage of prescription opioids, Int. J. Med. Inform. (2015), http://dx.doi.org/10.1016/j.ijmedinf.2015.09.002
G Model
ARTICLE IN PRESS
IJB-3238; No. of Pages 8
D.S. Carrell et al. / International Journal of Medical Informatics xxx (2015) xxx–xxx
2
timeliness and scalability [14], but exclusive reliance on structured data (such as diagnosis codes) may not accurately estimate the prevalence of problem opioid use. Patient symptoms, conditions, behaviors and compliance with health care team advice are sometimes documented in free text progress notes and/or problem lists but not diagnosis codes [15–17]. Further, some caregivers may be reluctant to assign diagnosis codes for problem opioid use because of their associated stigma and the relative ease with which they are detectable through automated surveillance. Clinical natural language processing (NLP) uses computational methods to analyze machine-readable unstructured clinical text such as progress notes recorded in an EHR [14,18,19]. Clinical NLP has been used in a number of health-related surveillance applications [20–25]. In the present study we used NLP methods to identify clinician entered descriptions of problem opioid use in the unstructured clinical notes of patients receiving chronic opioid therapy, validated these clinical descriptions with computerassisted human review of the relevant clinical documents, and compared the rates of problem opioid use measured by these methods to the rates found using diagnostic codes alone. 2. Methods 2.1. Setting and sample This study was conducted at Group Health (GH), a large, mixed-model health plan established in 1945 that now serves approximately 600,000 patients in urban, suburban, and rural areas of Washington State. We defined chronic opioid therapy (COT) as a ≥70-days’ supply of prescription opioid medications dispensed in a calendar quarter—a cut point that corresponds to more than three-fourths of the days in a quarter. We employed this definition because it corresponded to the operational definition of COT employed by GH for a COT risk reduction initiative implemented in 2010. Opioid medications included transdermal or oral opioids excluding buprenorphine. Study subjects were all GH patients age 18 or older who received COT for non-cancer pain during at least one calendar quarter during the period 2006–2012. Once a person received COT in at least one quarter from 2006 to 2012 s/he was considered a subject and all of the subject’s data during 2006–2012 was used for analysis, including data from periods when no COT was received. We randomly selected a four percent sample of eligible subjects for a sub-analysis which involved the manual review of these patients’ charts to gain further insight into how problem opioid use is documented in the medical record. The study was approved by GH’s Institutional Review Board. 2.2. Electronic health record data and text Since 2004, Group Health has documented care delivery through an Epic [26] electronic health record (EHR) system. Structured EHR data documenting internally delivered care and administrative claims data documenting externally received care are combined and stored in a research data warehouse [27], which includes enrollment, demographic, diagnosis, procedure and pharmacy use data (which we refer to as structured data). Electronic clinical text from the EHR including progress notes, consulting nurse notes, and other documentation is extracted nightly, full-text indexed using Microsoft SQL Server 2008 [28], and made available for research purposes in a secure database outside the EHR system. 2.3. Natural language processing (NLP) 2.3.1. NLP dictionary development To analyze this electronic clinical text using common rule-based NLP methods successfully applied in other clinical domains [22,29],
we developed a dictionary of terms describing problem opioid use in two stages. In the first stage, we identified illustrative examples of expression or terms clinicians used to describe problem opioid use through exploratory querying of study subjects’ clinical notes. In the second stage we used combinatorial expansion of synonyms to create grammatically correct, semantically comparable expressions. Stage one was an exploratory process of querying clinical notes and cataloguing descriptions of problem opioid use found by manual review. We used SQL Server’s built in full-text query functionality to retrieve random samples of notes based on their textual content. For example, one query identified documents containing the word “abuse” and words beginning with the letters “opioi” or “narco” (which can match to “opioid,” “opioids,” “narcotic,” “narcotics” or the abbreviation “narco”) that did not also contain the term “medication agreement.” Required words did not have to be adjacent to one another. This facilitated retrieval of notes with descriptions of problem use we could not have anticipated in advance (as well as retrieval of many notes that did not describe problem use). Excluding notes containing the term “medication agreement” was used in this example to reduce the number of notes that only contained hypothetical mentions of problem opioid use, such as “If you use more medication than prescribed you may become addicted to opioids,” a common phrase in medication agreements often inserted into patient notes. The terms we used in our initial queries were those used to describe problem opioid use as characterized in the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM5). [30] These included “opioid abuse,” “opioid misuse,” “opioid overuse,” and “opioid addiction.” To these we added a fifth term, “opioid dependence.” Though opioid dependence is a physiological condition that does not necessarily indicate problem use, Group Health physicians with expertise in chronic pain management indicated that clinicians sometimes document problem use as a form of dependence. Exploratory querying in stage one was conducted in a “snowball” [31] fashion whereby terms discovered in earlier queries were used in various combinations in subsequent queries to discover novel ways clinicians described problem opioid use in actual chart notes. In this manner scores of queries were executed and hundreds of documents reviewed. If the number of records returned by a query was greater than 20, we randomly selected 20 for manual review. A sample size of 20 was large enough to facilitate discovery of novel mentions of problem use, yet small enough to be reviewed quickly, allowing us to explore many different query formulations. Two authors (DSC and MVK) reviewed sampled notes to (1) confirm these terms were being used to describe problem opioid usage, and (2) identify novel expressions clinicians used to document problem opioid use. As we discovered synonyms (e.g., “Vico” as an abbreviation of “Vicodin”) and alternative grammatical constructions (e.g., “patient is abusing his pain medications”) we catalogued them. We then incorporated newly discovered terms into additional queries, each yielding a new sample of 20 documents for review. We iteratively continued this process of query and review, a method akin to snowball sampling [31], until we found no new synonyms, abbreviations, variant spellings or grammatical constructions. All descriptions of problem opioid usage we observed in stage one consisted of an opioid term preceded or followed by a term or phrase conveying problem usage. In stage two of our dictionary development, we expanded the manually identified descriptions of problem opioid usage in three ways. First, we added to our catalogue of terms referring to opioids other semantically equivalent terms we did not encounter by exploratory review. For example, we added “Fentanyl” and “Percocet” because the proper name “Vicodin” was used in references to problem usage, and we added the singular term “analgesic” because
Please cite this article in press as: D.S. Carrell, et al., Using natural language processing to identify problem usage of prescription opioids, Int. J. Med. Inform. (2015), http://dx.doi.org/10.1016/j.ijmedinf.2015.09.002
G Model
ARTICLE IN PRESS
IJB-3238; No. of Pages 8
D.S. Carrell et al. / International Journal of Medical Informatics xxx (2015) xxx–xxx
3
Table 1 Opioid terms, problem use terms, and rules for combining terms to generate 1288 dictionary entries for problem opioid usage. Term type
Term group
Terms
Number of terms
Opioid terms
A
Analgesic, narcotic, narcotic medication, narcotic pain medication, opiate, opiate medication, opiate pain medication, opioid, opioid medication, opioid pain medication, pain medication, polypharmacy Codeine, codene, codine, fentanyl, hydrocodone, methadone, morphine, oxy, oxy-ir, oxycod, oxycodone, oxycontin, percocet, polypharmacy, vico, vicodin Analgesics, narcotics, narcotic medications, narcotic pain medications, opiates, opiate medications, opiate pain medications, opioids, opioid medications, opioid pain medications, pain medications Abuse, abuser, misuse, misuser, overuse, overuser, over use, over user Addict Addiction Addicted Dependent, dependant, dependence, dependance Dependency, dependancy Abuses, misuses, overuses, over uses Abusing, misusing, overusing, over using
12
B
C
Problem use terms
D E F G H I J K
16
11
8 1 1 1 4 2 4 4
Rules for combining opioid terms and problem use terms
Rule group
Groups of grammatically correct expressions created by combining opioid terms and problem use terms
Number of dictionary entries
[Opioid term] + [Problem use term] (example: “pain medication abuse”)
L
586
[Problem use term] + [Opioid term] (ex: “abuses pain medications”) [Problem use term] + “of” + [Opioid term] (ex: “abuse of pain medications”) [Problem use term] + “on” + [Opioid term] (ex: “dependent on pain medications”) [Problem use term] + “to” + [Opioid term] (ex: “addiction to pain medications”) Total dictionary entries:
M
A + D, B + D, C + D, A + E, B + E, C + E, A + F, B + F, C + F, A + G, B + G, A + H, B + H, A + I, B + I J + B, J + C, K + B, K + C
N
D + of + B, D + of + C
216
O
H + on + B, H + on + C, I + on + B, I + on + C F + to + B, F + to + C, G + to + B, G + to + C, I + to + B, I + to + C
162
P
216
108
1,288
the plural form “analgesics” was similarly used. This yielded a total of 39 opioid terms which we arranged in three groups, each representing a grammatically correct set of expressions (Table 1, term groups A-C). Second, we added to the list of terms expressing problem usage other semantically equivalent terms not encounter by exploratory review. For example, because we observed four grammatical constructions for expressing opioid abuse (i.e., “opioid abuse,” “abuse of opioids,” “abuses opioids,” and “abusing opioids”), we added expressions so that all problem use terms could be represented by the same four constructions (including, e.g., “opioid misuse,” “misuse of opioids,” “misuses opioids,” and “misusing opioids”). This yielded a total of 25 problem use terms which we arranged in eight groups, each representing a grammatically correct set of expressions (Table 1, term groups D–K). Third, we expanded the dictionary to include all grammatically correct combinations of an opioid term and a problem usage term or expression. There were five grammatically correct combinations (Table 1, rule groups L–P). For example, rule group M in Table 1 represents the ways in which either of two groups of opioid terms (term groups J and K) can be followed by either of two groups of problem use terms (term groups B and C) to form four rules (J + B, J + C, K + B, and K + C) resulting in 216 unique grammatically correct expressions of the form: [Problemuseterm] + [Opioidterm] Applying all 31 rules from the five rule groups resulted in a total of 1288 unique dictionary entries (Table 1). 2.3.2. NLP system implementation Our NLP system performed a dictionary look up task using the custom dictionary, determined which instances of each term found were negated (or qualified by uncertainty, historical reference, or
reference to a person other than the patient), and then stored structured data for each instance in a database for subsequent analysis. The dictionary look up task involved identifying sequences of words in the clinical notes corresponding to dictionary entries ignoring case (e.g., “fentanyl addiction” and “Fentanyl ADDICTION” were not differentiated). Our dictionary look up was modeled after that of the widely used Apache Clinical Text Analysis and Knowledge Extraction System (cTAKES) [32]. Like cTAKES, our system implemented case-insensitive matching and allowed up to two extraneous words to appear between dictionary words. For example, the dictionary entry “abuses vicodin” was allowed to match to “abuses the vicodin” and “abuses his vicodin.” Unlike cTAKES, our system incorporated a regular-expression based edit distance algorithm using Python’s regex module [33]. When matching a dictionary word (e.g., “opioid”) with a candidate string in the text, the algorithm permitted a limited number of errors, including transpositions, insertions, or deletions of letters, based on word length to accommodate minor spelling variations (e.g., the misspelled terms “oipoids” and “opiodis” would both match the dictionary term “opioids”). Our NLP system determined when a mention of problem opioid use was negated or otherwise qualified using an approach similar to that used in the popular NegEx algorithm, which uses cue terms (e.g., “no,” “possible,” “daughter”), punctuation marks, and other grammatical features to determine when a mention is qualified [34]. For each mention identified in a note our system generated four variables indicating whether the mention was (a) historical (e.g., “patient has a history of narcotics abuse”), (b) hypothetical (e.g., “long term usage elevates risk of opioid addiction”), (c) a reference to someone other than the patient (e.g., “her son is addicted to prescription pain medications”), or (d) negated or qualified by uncertainty. The first three variables were coded 1 if the qualifier
Please cite this article in press as: D.S. Carrell, et al., Using natural language processing to identify problem usage of prescription opioids, Int. J. Med. Inform. (2015), http://dx.doi.org/10.1016/j.ijmedinf.2015.09.002
G Model IJB-3238; No. of Pages 8
ARTICLE IN PRESS D.S. Carrell et al. / International Journal of Medical Informatics xxx (2015) xxx–xxx
4
Table 2 Clinically-described conditions reviewers were instructed to consider as evidence of problem opioid use during the manual validation of patient chart documents and whether they were indicative of abuse/misuse or overuse. Condition indicating problem opioid use Drug treatment including referral or recommendation Methadone or suboxone treatment for addiction to illicit/street opioids while receiving prescription opioids Obtained opioids from non-medical sources Loss of control of opioids, craving Family member reported opioid addiction to clinician Significant treatment contract violation Concurrent ETOH abuse/dependence (not remitted) Concurrent use of illicit drugs (excluding marijuana) Current or recent opioid overdose Pattern of early refills (excluding isolated instances) Manipulation of physician to obtain opioids Obtained opioids from multiple MDs surreptitiously Opioid taper/wean due to problems (not due to expected pain improvement) Unsuccessful taper attempt Physician or patient wants immediate taper Rebound headache related to chronic opioid use
was present and 0 otherwise. The negation/uncertainty variable was coded on a five-point scale where 0 indicated the instance was unqualified by uncertainty or negation, 1 indicated the instance was probably not qualified, 2 indicated indeterminate status, 3 indicated probable negation, and 4 indicated negation. We considered an instance affirmative if the negation/certainty code was 4 or 3 and all three of the other variables were equal to zero. Following typical NLP development principles [35,36], we iteratively modified, evaluated, and revised the dictionary look up algorithm (without changing the dictionary) and the status annotation algorithm until their performance in a random sample of notes achieved high sensitivity and reasonable specificity. We chose a bias in favor of high sensitivity to minimize the number of mentions overlooked by the system, knowing that the NLP system’s false positive mentions would be identified and excluded from the analysis through subsequent computer-assisted manual validation. We processed the entire corpus with the final NLP system. To reduce the quantity of NLP system false positives (i.e., apparent mentions identified by the NLP system that were not affirmative descriptions of a patient’s problem opioid usage) we manually reviewed unique instances of dictionary hits that occurred five or more times in a random sample of half of the corpus. Based on this review we devised pattern-matching rules to remove common false positives. For example, one such rule removed apparent instances of the dictionary entry “addiction to opioids” found in statements like “patient is using physical therapy in addition to opioid therapy,” where “addition” was incorrectly interpreted as a spelling variation of “addiction” by the system’s spelling distance algorithm. 2.3.3. NLP-assisted manual validation For each patient whose chart contained any clinical documents with NLP-identified mentions of problem opioid use we manually reviewed up to four documents with mentions using a computerassisted chart review tool we developed. If the patient had more than four documents with mentions, the four chronologically earliest were selected for review. Our review tool facilitated rapid navigation across documents and patients, highlighted relevant terms, and managed validation data entry. Reviewers were trained chart abstractors with experience abstracting medical charts for research purposes. Reviewers were instructed to classify documents as documenting problem opioid use if they contained descriptions of any of the conditions listed in Table 2. Inter-rater reliability (IRR) for presence of problem opioid use was assessed by pairs of reviewers who each reviewed the same documents. IRR was high, with kappa = 0.86 and 97% agreement for one pair reviewing a sample of 50 documents, and kappa = 0.71, 88% agreement for another pair reviewing a sample of 137 documents.
2.4. Two indicators of problem opioid use We defined problem opioid use as a clinical assessment of prescription opioid addiction, abuse, misuse, or overuse. Though others may use “problem use” to refer to a broader range of physiological, personal, and social sequela, our definition is limited to these clinically relevant outcomes, and includes any clinical documentation of prescription drug abuse defined by the National Institute of Drug Abuse: “the use of a medication without a prescription, in a way other than as prescribed, or for the experience or feelings elicited” [37]. We calculated two independent dichotomous indicators. The first, our ICD-code based method, indicated whether, at any outpatient encounter or inpatient visit during 2006–2012, the patient received any of nine ICD-9 codes used to document problem opioid usage. These ICD-9 codes, recorded as structured EHR data (not clinician notes), included those for opioid abuse (305.5, 305.51, or 305.52) and opioid dependence (304, 304.01, 304.02, 304.7, 304.71, or 304.72). We include opioid dependence codes for the same reason we include mentions of “dependence” in our chart review—because clinicians sometimes enter ICD-9 codes for opioid dependence to indicate problem usage. Our second indicator was NLP-based. It indicated whether a patient’s electronic clinical text during 2006–2012 contained any mention of problem opioid usage as identified by the NLP system described above and validated by manual review. 2.5. Analyses We examined the correspondence of these ICD-9 and NLP-based indicators of problem opioid use. We also estimated the prevalence of problem opioid use three ways: (1) percent of patients with ICD-9-based evidence, (2) percent of patients with NLP-based evidence, and (3) percent of patients with either ICD-9- or NLP-based evidence. To investigate charts where the two indicators were discordant we conducted a manual chart review of patients within the four percent sub-sample who had ICD-9-based evidence of problem usage but lacked NLP-based evidence. 3. Results We identified 22,142 patients receiving at least one quarter of COT for non-cancer pain between 2006 and 2012. Characteristics of study patients are shown in Table 3. Nearly half (46.7%) were in the 45–64 age group. The median number of quarters with COT (70+ days’ supply) during 2006–2012 was 4. The most frequently prescribed opioid medications among study patients were hydrocodone plus aspirin or acetaminophen, short-acting
Please cite this article in press as: D.S. Carrell, et al., Using natural language processing to identify problem usage of prescription opioids, Int. J. Med. Inform. (2015), http://dx.doi.org/10.1016/j.ijmedinf.2015.09.002
G Model
ARTICLE IN PRESS
IJB-3238; No. of Pages 8
D.S. Carrell et al. / International Journal of Medical Informatics xxx (2015) xxx–xxx
5
Table 3 Characteristics of study patients. Characteristic
All
4% sub-sample
N patients Age: 18–44 45–64 65+ Sex (female) Race is non-Hispanic white Quarters during 2006–2012 with 30+ days’ supply, median (IQR) Quarters during 2006–2012 with 70+ days’ supply, median (IQR) Mental health specialty visits during 2006–2012 Back pain diagnosis during 2006–2012 Arthritis or osteoarthritis diagnosis during 2006–2012
22,142 35.6% 46.7% 17.8% 63.4% 86.3%a 8 (4–15) 4 (2–11) 33.5% 79.5% 46.8%
888 34.8% 47.1% 18.1% 61.8% 84.5%a 8 (4–16) 5 (2–11) 33.6% 82.2% 45.5%
a
Percentage based on non-missing data.
oxycodone, and sustained release morphine sulfate, representing 40.0%, 28.2% and 8.8%, respectively, of all opioids dispensed to study patients during the study period. Back pain and osteoarthritis were the two most common pain diagnoses (Table 3). Overall, 2240 (10.1%) of study patients had at least one structured ICD-9 code indicating opioid abuse or dependence. The distributions of these codes among all 22,142 study patients and the four percent sub-sample are shown in Table 4. The NLP system processed 7,336,445 notes belonging to 21,795 (98%) of the 22,142 study patients having at least one clinical note, and found instances of dictionary mentions in 35,117 (5%) notes for 11,191 (51%) patients. Applying the rule-based filter to exclude non-affirmative mentions reduced the number of notes requiring manual review to 7751 (0.1%), and the number of patients with notes for review to 3156 (14.3%). The mean time our abstractors spent reviewing and coding notes using the computer assisted interface was 90 s per note (∼40 notes per hour). This review yielded a final set of 1875 study patients (8.5% of all patients) with validated mentions of problem opioid use. The false positive rate for notes identified by the NLP system was 2685/7,751 or 34.6%; the false positive rate for patients identified by the NLP system was 1281/3,156 or 41% (some patients had fewer notes for review than others). A summary of reasons given by the human reviewers when validating these notes is shown in Table 5. Clinical descriptions of prescription opioid abuse and misuse were the most common validated mentions. The most common reason for invalidating NLPbased mentions were that they appeared only in the EHR problem list (and not also in the clinical narrative), that the reference was to problem opioid usage in the distant (but undated) past, and that the mentions were hypothetical but the NLP system failed to annotate them as such. The agreement of ICD-9- and NLP-based indicators of problem opioid use among all study patients is shown in Table 6. Most patients (86.6%) receiving COT had neither ICD-9- nor NLP-based evidence of problem use. There were 1147 (5.2%) patients with
both ICD-9-based and NLP-based evidence of problem opioid usage, 1093 (4.9%) patients with only ICD-based evidence, and 728 (3.3%) patients with only NLP-based evidence. The prevalence of problem opioid usage in this study cohort based on ICD-9 codes was 10.1%, compared to 8.5% based on NLP evidence. The Cohen’s kappa [38] for agreement between ICD-9 codes and NLP was 0.51 (95% CI: 0.491–0.533). Considering either ICD-9- or NLP-based evidence of problem opioid use yields a prevalence of 13.4%. To explore apparent discordances between the ICD-9 and NLP evidence we conducted an in-depth manual chart review of 43 patients (4.3%) from the four percent sub-sample who had ICD-9 codes for problem opioid use but no NLP-derived evidence. These reviews suggested three categories of explanations for discordances. For approximately one-third of these patients the clinical narrative described symptoms and behavioral patterns consistent with problem opioid use currently (28%) or in the distant past (6%). However, since our NLP system only selected notes where clinicians explicitly labeled the patient as having experienced problem opioid use (see Table 4), notes from these charts were not selected for review. The resulting discordance between NLP-assisted ICD-9 findings was thus an artifact of the way problem opioid use was described in these charts rather than a conflict as to whether or not the patient was experiencing the condition. This is consistent with work we reported elsewhere indicating that 23.9% of patient charts in this same cohort that had only ICD-9-based evidence of problem opioid use also contained notes describing symptoms and behaviors consistent with problem use around the time of the encounter in which the ICD-9 diagnosis was recorded. [39] The second explanation, found in another one-third of discordant charts reviewed in the present study, was that the ICD-9 code was being used to document physiological dependence to opioids, which does not necessarily indicate problem opioid use. For these charts there is a true discordance between the ICD-9 codes and our NLPderived mentions. However, this discordance is not easily resolved. Clinicians commonly (if inaccurately) use dependence-related
Table 4 Frequency distribution of ICD-9 diagnosis codes for problem opioid usage among all study patients and patients in the 888-patient sub-sample. ICD-9 diagnosis code
All 22,142 study patients N
Any of the following nine codes 305.50 opioid abuse, unspecified 305.51 Opioid abuse, continuous 305.52 Opioid abuse, episodic 304.00 Opioid dependence, unspecified 304.01 Opioid dependence, continuous 304.02 Opioid dependence, episodic 304.70 Opioid/other dependence, unspecified 304.71 Opioid/other dependence, continuous 304.72 Opioid/other dependence, episodic
2,240 333 100 19 1746 819 48 51 82 8
4% sub-sample (%) 10.1% 10.4% 3.1% 0.6% 54.5% 25.6% 1.5% 1.6% 2.6% 0.3%
N 82 10 5 0 58 32 1 0 4 0
(%) 9.2% 9.1% 4.6% 0% 52.7% 29.1% 0.9% 0% 3.6% 0%
Note: A patient may be counted up to once for each diagnosis code category.
Please cite this article in press as: D.S. Carrell, et al., Using natural language processing to identify problem usage of prescription opioids, Int. J. Med. Inform. (2015), http://dx.doi.org/10.1016/j.ijmedinf.2015.09.002
G Model
ARTICLE IN PRESS
IJB-3238; No. of Pages 8
D.S. Carrell et al. / International Journal of Medical Informatics xxx (2015) xxx–xxx
6
Table 5 Summary of reasons provided by the manual reviewers for validating or invalidating documents containing NLP-derived mentions of problem opioid use among all 22,142 study patients and patients the 4% sub-sample. Validation result and reason
All patients
4% sub-sample
N
(%)
Validated Opioid addiction, abuse, or misuse Opioid overuse only Recent history of problem opioid usage Headache induced by opioid overuse
1477 544 87 51
35% 13% 2% 1%
68 25 4 1
39.5% 14.5% 2.3% 0.6%
Invalidated Description in the EHR problem list only Distant history (non-recent) of problem usage Discussion of opioid abuse risks (only) Other hypothetical mentions Problem use by a family member or other Other
1135 334 297 75 44 90
27.5% 8.1% 7.2% 1.8% 1.1% 2.2%
32 15 15 5 2 5
18.6% 8.7% 8.7% 2.9% 1.2% 2.9%
N
(%)
Note: A patient may contribute to the count of more than one validation category when >1 document was manually reviewed.
Table 6 Cross-tabulation of dichotomous ICD-based and validated NLP-based measures of problem opioid usage for all 22,142 study patients during 2006–2012. N (%)
Validated NLP-identified mention present
Validated NLP-identified mention absent
All
ICD-9 code present
1147 (5.2%)
1093 (4.9%)
2240 (10.1%)
ICD-9 code absent
728 (3.3%)
19,174 (86.6%)
19,902 (89.9%)
All
1875 (8.5%)
20,267 (91.5%)
22,142 (100%)
Cohen’s un-weighted kappa for concordance between ICD-9 and NLP: .50 (95% CI: 0.48–0.52).
ICD-9 codes to document actual problem opioid use, as noted under “NLP dictionary development” in the Methods section. In the remaining one-third of discordant charts we reviewed the ICD-9 code was acquired during care received outside the Group Health system, usually in an emergency room or hospital setting, for which electronic encounter notes were not available in the Group Health EHR. Without access to these external clinical notes we could not resolve whether these discordances were real or artifacts of missing data. 4. Discussion These findings suggest that semi-automated NLP methods applied to electronic clinical text readily available in EHR systems have the potential to improve surveillance of problem opioid use compared to methods that rely on ICD-9 codes alone. Charts containing explicit clinician descriptions of problem usage often lack corresponding ICD-9 codes. Exclusive reliance on ICD-9 coding to identify problem opioid use underestimates the actual prevalence. Supplementing traditional ICD-9-based surveillance methods with NLP-derived evidence may increase estimates of the prevalence of problem opioid use among recipients of COT by as much as much as one-third. The definition of problem opioid use we used in this study was based largely on the kinds of indications of prescription opioid addiction and misuse that we found to be documented in clinical notes of patients receiving chronic opioid therapy. Problem opioid use is less specific and broader than prescription opioid use disorder, as defined by Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) criteria. We initially tried to identify indicators of DSM-5 criteria for prescription opioid use disorder, but found that relevant information was infrequently
documented in clinical notes for many of the criteria (e.g., problems at home or work due to substance use), or information was lacking regarding whether the problem was attributable to use of prescription opioids. In defining indicators of problem opioid use, we sought to identify behaviors that prescribing clinicians regarded as problematic or unsafe. For this reason, use of illicit opioids was considered an indicator of problem opioid use, as concurrent use of prescription and illicit opioids is inherently problematic. By extension, receiving methadone or buprenorphine treatment for opioid addiction was also considered an indicator of problem opioid use, as the patient was receiving active treatment for opioid addiction. There were very few patients who had medication assisted treatment for opioid addiction as the only indication of problem opioid use. A general lesson learned from the present study is that there are certain information retrieval tasks for which neither a fullyautomated NLP system nor traditional manual review are initially feasible, but which can be accomplished using a hybrid strategy of NLP-assisted manual review. Our experience with NLP-based methods in this study suggests that such methods may be scalable and suitable for research applications in large patient populations with high efficiency. Our approach to using NLP-assisted manual chart review to identify clinical conditions documented in clinical text, but where fully automated identification of cases is not initially feasible, is a method that could be applied to other clinical conditions. Applying this method to other conditions would require developing a custom dictionary of relevant terms and concepts specific to that condition; other components of the system could be re-used after modifying the categories trained abstractors use to code their decisions in the NLP-assisted records review interface. We have applied similar methods (in unpublished work) identifying patients with polyneuropathy—often a slowly emerging and difficult to diagnose condition that is not adequately identified using ICD-9 diagnostic
Please cite this article in press as: D.S. Carrell, et al., Using natural language processing to identify problem usage of prescription opioids, Int. J. Med. Inform. (2015), http://dx.doi.org/10.1016/j.ijmedinf.2015.09.002
G Model IJB-3238; No. of Pages 8
ARTICLE IN PRESS D.S. Carrell et al. / International Journal of Medical Informatics xxx (2015) xxx–xxx
codes. We believe the current study illustrates the utility of this method in a particularly challenging use case. Clinical documentation of problem opioid use is hampered by the inherent difficulties in identifying problem opioid use (e.g., physiological dependence in the absence of opioid misuse or addiction), attempts by some patients to conceal opioid abuse, and reluctance of some clinicians to clearly and unequivocally document problem opioid use in a patient’s chart. In future work, we will explore the application of statistical machine learning methods to improve the performance of the NLP techniques we used to select clinical notes for manual review. We did not use machine learning in the present study due to the lack of an initial “gold standard” classification, and the initial utility of the simpler rule-based approach we describe here. However, we anticipate machine learning may be advantageous based on its successful application in many other clinical areas [40] and in initial results of our related ongoing work. Looking to the future, innovative applications of NLP techniques that extend those used in the present study are relevant to pharmacovigilance and drug monitoring methods [41–44]. Computer-assisted review of electronic health records could enhance the feasibility and efficiency of routine surveillance of drug-related adverse events, even for complex or ill-defined sequelae such as problem opioid use. The preliminary work reported in this paper is not ready for clinical application, but our results suggest that developing EHR capabilities in support of pharmacovigilance and drug monitoring could enhance detection of drug-related adverse events. Interpretation of our findings regarding the prevalence of problem prescription opioid use are limited by the source of information (i.e., EHR documented problems), the extent to which clinicians recognized and documented problem opioid use, and our specific study population (patients receiving long-term prescription opioid therapy for chronic non-cancer pain). We estimate the percent of patients with clinically recognized and documented problem opioid use, not the true prevalence of prescription opioid use disorder or opioid misuse. The true prevalence of prescription opioid use disorder and clinically significant opioid misuse could be substantially higher than the estimates of the prevalence of clinically recognized and documented problem opioid use from this study. A limitation of our NLP-based method is that it was developed and evaluated in a single health care setting. Its generalizability to, and performance in, other settings is currently unknown. Another limitation is that our NLP system was designed to identify instances of explicitly labeled problem opioid usage expressed in single phrases in individual clinical notes (e.g., a statement that a patients is “abusing oxycodone”), as is necessary with NLP systems employing dictionary look up methods. Though our dictionary contains over 1200 entries clinicians may use to describe problem opioid usage, it is possible that charts of patients experiencing problem opioid use may not contain explicit descriptions of problem opioid use. Our exploratory review of charts in the four percent sub-sample where ICD-9 and NLP-based evidence was discordant suggests that information contained in records from multiple clinical encounters may indicate problem opioid use when no single clinical note explicitly identifies problem use. In our ongoing research we are using statistical machine learning methods in conjunction with NLP in an effort to develop more robust automated algorithms capable of detecting not only explicitly labeled problem use, but also more subtle descriptions of behavior and symptoms consistent with problem opioid use. Such approaches may enhance both the sensitivity and specificity of automated methods for identifying problem prescription opioid use in EHR data.
7
Summary Points • Important clinical conditions are often documented by hard to find text in a patient’s medical record. • Natural language processing can quickly identify relevant content in vast quantities of clinical text. • Computer-assisted review of NLP-retrieved text is an efficient, accurate way to validate text-documented conditons. • NLP and computer-assisted review identify many cases of problem opioid use not documented by diagnosis codes.
5. Conclusions Clinical text from EHR systems can be used to identify patients who manifest problem opioid use while receiving chronic opioid therapy. Both the automated NLP method and the computerassisted manual validation method we describe can be efficiently applied to large patient populations where the volume of clinical text far exceeds what could be reviewed using traditional manual abstraction methods. To further enhance the sensitivity and specificity of methods for surveillance of problem opioid use, future work should explore use of machine learning methods for identifying indirect evidence of problem opioid use in charts that lack both ICD-9 coding and explicit clinician labeling of problem opioid use. Author contributions Scientific contributions of individual authors to this work are as follows: All authors contributed to the design of the study, the interpretation of study data, and critical review and revision of the manuscript. Von Korff and Carrell oversaw implementation of study methods and related quality control activities. Saunders, Cronkite, and Carrell implemented various programming and technical tasks related to the study. Carrell and Von Korff conducted exploratory reviews of electronic chart notes. Saunders and Von Korff participated in manual chart validation tasks. Saunders and Cronkite performed the programming, data analysis, and data summation tasks. Cronkite developed custom software used for the computerassisted review of chart notes. Carrell was primarily responsible for drafting the manuscript; Cronkite and Saunders drafted sections of the methods, and Von Korff contributed to the overall organization and revision of the manuscript. All authors participated sufficiently in the work to believe in its overall validity and take public responsibility for appropriate portions of its content. Acknowledgements This study was part of a research collaboration between Group Health Research Institute and Pfizer Inc. (“Electronic Research of Opioid Abuse Detection and Surveillance (e-ROADS-1)”), carried out with financial support from Pfizer Inc. The research concepts and publication plans were agreed upon prior to commencing the study by the Joint Research Governance Committee of the Group Health-Pfizer Research Collaboration, comprised of Group Health and Pfizer Inc. employees. Funding for work by Group Health Research Institute from Pfizer, Inc. included support for the development of this manuscript. David S. Carrell, David Cronkite, Kathleen Saunders, and Michael Von Korff participated in this work as employees of Group Health Research Institute. Roy E. Palmer, David Gross, Elizabeth Masters, and Timothy Hylan are employees and stockholders of Pfizer Inc. Kathleen Saunders has stock in Merck. David Carrell and Michael Von Korff are principal investigators of funded grants to Group Health Research Institute from Pfizer Inc. for research on use of
Please cite this article in press as: D.S. Carrell, et al., Using natural language processing to identify problem usage of prescription opioids, Int. J. Med. Inform. (2015), http://dx.doi.org/10.1016/j.ijmedinf.2015.09.002
G Model IJB-3238; No. of Pages 8
ARTICLE IN PRESS D.S. Carrell et al. / International Journal of Medical Informatics xxx (2015) xxx–xxx
8
natural language processing to identify problem opioid use. They are also principal investigators for grants to Group Health Research Institute from a consortium of drug companies to conduct FDAmandated research concerning risks of problem opioid use among patients receiving extended release opioids. References [1] J. Mayer, S. Shen, B.R. South, S. Meystre, F.J. Friedlin, W.R. Ray, M. Samore, Inductive creation of an annotation schema and a reference standard for de-identification of VA electronic clinical notes, AMIA Annu. Symp. Proc. (2009) 416–420. [2] National Center for Health Statistics. Health, United States, 2013: with special feature on prescription drugs. Hyattsville, MD. 2014. [3] Office of National Drug Control Policy. Epidemic: responding to America’s prescription drug abuse crisis. Available from: http://www. whitehousedrugpolicy.gov/publications/pdf/rx abuse plan.pdf, 2011. [4] Institute of Medicine. Relieving pain in America: a blue print for transforming prevention, care, education and research Washington DC: National Academy of Sciences, 2011. [5] H.G. Birnbaum, A.G. White, M. Schiller, T. Waldman, J.M. Cleveland, C.L. Roland, Societal costs of prescription opioid abuse, dependence, and misuse in the United States, Pain Med. 12 (2011) 657–667. [6] A.G. White, H.G. Birnbaum, M.N. Mareva, M. Daher, S. Vallow, J. Schein, N. Katz, Direct costs of opioid abuse in an insured population in the United States, J. Managed Care Pharm.: JMCP 11 (2005) 469–479. [7] S.B. Thacker, R.L. Berkelman, Public health surveillance in the United States, Epidemiol. Rev. 10 (1988) 164–190. [8] P. Nsubuga, M.E. White, S.B. Thacker, M.A. Anderson, S.B. Blount, C.V. Broome, T.M. Chiller, V. Espitia, R. Imtiaz, D. Sosin, D.F. Stroup, R.V. Tauxe, M. Vijayaraghavan, M. Trostle, et al., Public Health surveillance: a tool for targeting and monitoring interventions, in: D.T. Jamison, J.G. Breman, A.R. Measham (Eds.), Disease Control Priorities in Developing Countries, World Bank, Washington (DC), 2006. [9] J.A. Boscarino, M. Rukstalis, S.N. Hoffman, J.J. Han, P.M. Erlich, G.S. Gerhard, W.F. Stewart, Risk factors for drug dependence among out-patients on opioid therapy in a large US health-care system, Addiction 105 (2010) 1776–1782. [10] R. Chou, G.J. Fanciullo, P.G. Fine, C. Miaskowski, S.D. Passik, R.K. Portenoy, Opioids for chronic noncancer pain: prediction and identification of aberrant drug-related behaviors: a review of the evidence for an American Pain Society and American Academy of Pain Medicine clinical practice guideline, J. Pain 10 (2009) 131–146.e5. [11] M.F. Fleming, S.L. Balousek, C.L. Klessig, M.P. Mundt, D.D. Brown, Substance use disorders in a primary care sample receiving daily opioid therapy, J. Pain 8 (2007) 573–582. [12] M.D. Sullivan, M.J. Edlund, M.Y. Fan, A. Devries, B. Brennan, J. raden, B.C. Martin, Risks for possible and probable opioid misuse among recipients of chronic opioid therapy in commercial and medicaid insurance plans: the TROUP Study, Pain 150 (2010) 332–339. [13] D.C. Turk, K.S. Swanson, R.J. Gatchel, Predicting opioid misuse by chronic pain patients: a systematic review and literature synthesis, Clin. J. Pain 24 (2008) 497–508. [14] C.G. Chute, Invited commentary: observational research in the age of the electronic health record, Am. J. Epidemiol. 179 (2014) 759–761. [15] D.S. Carrell, S. Halgrim, D. Tran, D.S.M. Buist, J. Chubak, W.W. Chapman, G. Savova, Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence, Am. J. Epidemiol. (2013), In Press. [16] P.L. Peissig, L.V. Rasmussen, R.L. Berg, J.G. Linneman, C.A. McCarty, C. Waudby, L. Chen, J.C. Denny, R.A. Wilke, J. Pathak, D. Carrell, A.N. Kho, J.B. Starren, Importance of multi-modal approaches to effectively identify cataract cases from electronic health records, J. Am. Med. Inform. Assoc. 19 (2012) 225–234. [17] J.S. Floyd, S.R. Heckbert, N.S. Weiss, D.S. Carrell, B.M. Psaty, Use of administrative data to estimate the incidence of statin-related rhabdomyolysis, JAMA 307 (2012) 1580–1582. [18] P.M. Nadkarni, L. Ohno-Machado, W.W. Chapman, Natural language processing: an introduction, J. Am. Med. Inform. Assoc. 18 (2011) 544–551. [19] G.K. Savova, L. Deleger, I. Solti, J. Pestian, J.W. Dexheimer, Natural language processing: applications in pediatric research, in: J.J. Hutton (Ed.), Pediatric Biomedical Informatics: Computer Applications in Pediatric Research, Springer, New York, 2012, pp. 173–192. [20] D. Carrell, S. Halgrim, D. Tran, A Natural Language Processing Algorithm for Improving Efficiency of Breast Cancer Surveillance Abstraction, Amia Annual Symposium, Washington, DC, 2011.
[21] P.L. Elkin, D.A. Froehling, D.L. Wahner-Roedler, S.H. Brown, K.R. Bailey, Comparison of natural language processing biosurveillance methods for identifying influenza from encounter notes, Ann. Intern. Med. 156 (2012) 11–18. [22] M.E. Matheny, F. Fitzhenry, T. Speroff, J.K. Green, M.L. Griffith, E.E. Vasilevskis, E.M. Fielstein, P.L. Elkin, S.H. Brown, Detection of infectious symptoms from VA emergency department and primary care clinical documentation, Int. J. Med. Inform. 81 (2012) 143–156. [23] S.M. Meystre, G.K. Savova, K.C. Kipper-Schuler, J.F. Hurdle, Extracting information from textual documents in the electronic health record: a review of recent research, Yearb. Med. Inform. (2008) 128–144. [24] H.J. Murff, F. FitzHenry, M.E. Matheny, N. Gentry, K.L. Kotter, K. Crimin, R.S. Dittus, A.K. Rosen, P.L. Elkin, S.H. Brown, T. Speroff, Automated identification of postoperative complications within an electronic medical record using natural language processing, JAMA 306 (2011) 848–855. [25] B.R. South, W.W. Chapman, S. Delisle, S. Shen, E. Kalp, T. Perl, M.H. Samore, A.V. Gundlapalli, A. Optimizing, syndromic surveillance text classifier for influenza-like illness: does document source matter? AMIA Annu. Symp. Proc. (2008) 691–696. [26] Epic Systems Corporation. Epic, 2014. http://www.epic.com/. [27] M.C. Hornbrook, G. Hart, J.L. Ellis, D.J. Bachman, G. Ansell, S.M. Greene, E.H. Wagner, R. Pardee, M.M. Schmidt, A. Geiger, A.L. Butani, T. Field, H. Fouayzi, I. Miroshnik, L. Liu, R. Diseker, K. Wells, R. Krajenta, L. Lamerato, C. Neslund Dudas, Building a virtual cancer research organization, J. Natl. Cancer Inst. Monogr. 1 (2005) 12–25. [28] Microsoft. Full-Text Search (SQL Server), 2012. http://msdn.microsoft.com/ en-us/library/ms142571.aspx. [29] R.J. Byrd, S.R. Steinhubl, J. Sun, S. Ebadollahi, W.F. Stewart, Automatic identification of heart failure diagnostic criteria, using text analysis of clinical notes from electronic health records, Int. J. Med. Inform. 83 (2014) 983–992. [30] V.N. Vahia, Diagnostic and statistical manual of mental disorders 5: a quick glance, Ind. J. Psychiatry 55 (2013) 220–223. [31] D.D. Heckathorn, Respondent-driven sampling: a new approach to the study of hidden populations, Soc. Probl. 44 (1997) 174–199. [32] G.K. Savova, J.J. Masanz, P.V. Ogren, J. Zheng, S. Sohn, K.C. Kipper-Schuler, C.G. Chute, Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications, J. Am. Med. Inform. Assoc. 17 (2010) 507–513. [33] Python regex module version 2013-11-29. https://pypi.python.org/pypi/ regex. [34] W.W. Chapman, W. Bridewell, P. Hanbury, G.F. Cooper, B.G. Buchanan, A simple algorithm for identifying negated findings and diseases in discharge summaries, J. Biomed. Inform. 34 (2001) 301–310. [35] J.P. Pestian, L. Deleger, G.K. Savova, J.W. Dexheimer, I. Solti, Natural language processing—the basics, in: J.J. Hutton (Ed.), Pediatric Biomedical Informatics: Computer Applications in Pediatric Research, Springer, New York, 2012, pp. 149–172. [36] G.K. Savova, L. Deleger, I. Solti, J. Pestian, J.W. Dexheimer, Natural language processing: applications in pediatric research, in: J.J. Hutton (Ed.), Pediatric Biomedical Informatics: Computer Applications in Pediatric Research, Springer, New York, 2012, pp. 173–192. [37] NIDA. Prescription Drug Abuse, National Institute of Drug Abuse (National Institute of Drug Abuse), 2015. http://www.drugabuse.gov/publications/ research-reports/prescription-drugs/what-prescription-drug-abuse. [38] J. Cohen, A coefficient of agreement for nominal scales, Educ. Psychol. Meas. 20 (1960) 213–220. [39] R.E. Palmer, D.S. Carrell, D. Cronkite, K. Saunders, D.E. Gross, E. Masters, S. Donevan, T.R. Hylan, K. Von, M. roff, The prevalence of problem opioid use in patients receiving chronic opioid therapy: computer-assisted review of electronic health record clinical notes, Pain 156 (2015) 1208–1214. [40] I. Spasic, J. Livsey, J.A. Keane, G. Nenadic, Text mining of cancer-related information: review of current status and future directions, Int. J. Med. Inform. 83 (2014) 605–623. [41] P. LePendu, S.V. Iyer, A. Bauer-Mehren, R. Harpaz, J.M. Mortensen, T. Podchiyska, T.A. Ferris, N.H. Shah, Pharmacovigilance using clinical notes, Clin. Pharmacol. Ther. 93 (2013) 547–555. [42] N.J. Leeper, A. Bauer-Mehren, S.V. Iyer, P. Lependu, C. Olson, N.H. Shah, Practice-based evidence: profiling the safety of cilostazol by text-mining of clinical notes, PLoS One 8 (2013) e63499. [43] S.V. Iyer, R. Harpaz, P. LePendu, A. Bauer-Mehren, N.H. Shah, Mining clinical text for signals of adverse drug–drug interactions, J. Am. Med. Inform. Assoc. 21 (2014) 353–362. [44] P. Lependu, S.V. Iyer, A. Bauer-Mehren, R. Harpaz, Y.T. Ghebremariam, J.P. Cooke, N.H. Shah, Pharmacovigilance using clinical text, AMIA Joint Summits on Translational Science Proceedings AMIA Summit on Translational Science 2013 (2013) 109.
Please cite this article in press as: D.S. Carrell, et al., Using natural language processing to identify problem usage of prescription opioids, Int. J. Med. Inform. (2015), http://dx.doi.org/10.1016/j.ijmedinf.2015.09.002