Integrating natural language processing expertise with patient safety event review committees to improve the analysis of medication events

Integrating natural language processing expertise with patient safety event review committees to improve the analysis of medication events

International Journal of Medical Informatics xxx (xxxx) xxx–xxx Contents lists available at ScienceDirect International Journal of Medical Informati...

715KB Sizes 0 Downloads 3 Views

International Journal of Medical Informatics xxx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

International Journal of Medical Informatics journal homepage: www.elsevier.com/locate/ijmedinf

Technical Notes

Integrating natural language processing expertise with patient safety event review committees to improve the analysis of medication events ⁎

Allan Fonga, , Nicole Harriottb, Donna M. Waltersb, Hanan Foleyb, Richard Morrisseyb, Raj R. Ratwania,c a b c

National Center for Human Factors in Healthcare, MedStar Health, 3007 Tilden St. NW, Suite 7M, Washington, D.C. 20008, USA Georgetown University Medical Center, 3800 Reservoir Rd NW, Washington, D.C. 20007, USA Department of Emergency Medicine, Georgetown University School of Medicine, 3900 Reservoir Rd. NW Washington D.C. 20007, USA

A R T I C L E I N F O

A B S T R A C T

Keywords: Patient safety events Natural language processing Medication Machine learning Visualization

Objectives: Many healthcare providers have implemented patient safety event reporting systems to better understand and improve patient safety. Reviewing and analyzing these reports is often time consuming and resource intensive because of both the quantity of reports and length of free-text descriptions in the reports. Methods: Natural language processing (NLP) experts collaborated with clinical experts on a patient safety committee to assist in the identification and analysis of medication related patient safety events. Different NLP algorithmic approaches were developed to identify four types of medication related patient safety events and the models were compared. Results: Well performing NLP models were generated to categorize medication related events into pharmacy delivery delays, dispensing errors, Pyxis discrepancies, and prescriber errors with receiver operating characteristic areas under the curve of 0.96, 0.87, 0.96, and 0.81 respectively. We also found that modeling the brief without the resolution text generally improved model performance. These models were integrated into a dashboard visualization to support the patient safety committee review process. Conclusions: We demonstrate the capabilities of various NLP models and the use of two text inclusion strategies at categorizing medication related patient safety events. The NLP models and visualization could be used to improve the efficiency of patient safety event data review and analysis.

1. Introduction Adverse drug events are a leading cause of preventable patient harm [1–3]. In an effort to reduce patient harm events associated with medications many healthcare systems have implemented patient safety event reporting systems to better identify safety hazards associated with pharmacy and medication administration, as well as other types of events [4,5]. The reporting systems generally provide a method for provider staff to submit a description of a safety hazard ranging from a near miss, where no patient harm occurred, to a serious safety event that resulted in patient harm. Many patient safety event reporting systems contain hundreds to thousands of medication related events and have the potential to dramatically improve care and reduce adverse drug events [6,7]. However, there are several challenges associated with the data from these reporting systems [8]. Often, the data are difficult to interpret and act on because of the large number of reports, amount of free-text, and



variability in category assignment by reporters. In order to utilize the patient safety event data more rigorously many hospitals have created review committees, composed of clinicians focused on safety and quality, to review each event, categorize them appropriately to better understand trends, and develop solutions once trends are recognized. The committee review of the events is an incredibly labor intensive process given the large volume of reports generated each week. This difficulty is compounded in large healthcare systems where data from multiple hospitals need to be efficiently analyzed to understand overall patterns and trends across the system. Each report can take several minutes to initially review and then additional time during the committee meeting to further discuss. Our goal is to develop a more efficient and streamlined method for categorizing patient safety event reports based on modeling the freetext of event reports to reduce the review time of the committee. We describe a collaborative effort in which informatics and safety science experts joined a clinical safety committee to develop an algorithmic

Corresponding author. E-mail addresses: [email protected], [email protected] (A. Fong).

http://dx.doi.org/10.1016/j.ijmedinf.2017.05.005 Received 27 February 2016; Received in revised form 24 March 2017; Accepted 8 May 2017 1386-5056/ © 2017 Elsevier B.V. All rights reserved.

Please cite this article as: Fong, A., International Journal of Medical Informatics (2017), http://dx.doi.org/10.1016/j.ijmedinf.2017.05.005

International Journal of Medical Informatics xxx (xxxx) xxx–xxx

A. Fong et al.

approach to more automatically review and categorize medication events. The intent is to eventually develop a computational system that can categorize events in near real-time, hence reducing the time for committee review and expediting the process of identifying meaningful trends that can then be acted on to reduce adverse drug events. There are three main contributions of this case report. First, we develop and evaluate the performance of different modeling techniques to categorize four medication safety issues. Second, we evaluate model performance of two text inclusion conditions. The first condition includes only the brief factual description from medication related event reports as provided by the frontline staff member entering the report. The second combines both the brief factual description and resolution text, which is a short description typically provided by a manager that has reviewed the event report. Lastly we deploy the best models in an interactive visualization which categorizes reports in near real-time and allows users to provide feedback to the algorithm allowing for continued model training.

research has primarily focused on assigning events to general categories such as computer related events or harm events [13,15]. Our focus is on developing algorithms to classify events into specific categories that are more actionable by the patient safety committee, such as medication workflow. For this application we evaluated support vector machines (SVM), decision trees (DT), and cosine similarity (COS) models to classify specific medication related patient safety events. In addition to the difference in specificity, previous work has generally considered reports as a single document either only considering the brief text or concatenating the brief and resolution text. It is unclear from previous work which strategy is more accurate for categorizing events in specific categories. We present an evaluation of these two different text inclusion strategies.

2. Background

To train and validate our models, we started with 774 medication safety events that have been manually annotated and reviewed by the safety and quality committee (2 MDs, 1 PharmD, 3 RNs). Every report has a free-text brief factual description ranging from 9 to 424 words (77.9 mean, 59 median, 63.3 std). Six hundred ninety-five reports (90%) have resolution free-text averaging 50.6 words (29 median 29, 61.7 std) and were used for the model development efforts, Fig. 1. This study was approved by the MedStar Health Research Institute Institutional Review Board (protocol #2014-101).

3. Method 3.1. Data sources

2.1. Data elements in patient safety event reports Patient safety event reporting systems are generally composed of structured and unstructured data [9,10]. When entering a report, the frontline staff selects a general category from a predefined list of categories (e.g. medication, fall, surgery) and a specific event type category. The reporter then enters a free-text description (brief factual description) of the safety hazards which can vary in length. Lastly, reports can sometimes be accompanied with additional free-text about how the event was resolved or addressed (resolution). A major challenge with patient safety event reports is that the categories selected by reporters are often inaccurate and to fully understand the safety event, one has to read the free-text description. These category types are often ambiguous to the reporter and the reporter generally does not have the time to determine which category is the best fit for often complex events [11].

3.2. Medication categories We selected four medication safety event categories to model, described in Table 1. These categories tend to focus on workflow and decision making processes around medication safety events and were identified by the committee as promising categories for eventually introducing interventions to reduce the identified safety hazard. Of the 695 reports with brief and resolution text, 56 reports were categorized as pharmacy delivery delays, 68 were categorized as pharmacy dispensing errors, 108 reports were categorized as prescriber errors, and 64 were categorized as Pyxis discrepancy errors. The remaining 399 reports were categorized into other categories and included as negative cases in the model development.

2.2. Clinical committee review At MedStar Georgetown University Hospital a committee composed of physicians, nurses, pharmacists, and patient safety experts review each patient safety event report. The committee discusses each event, recategorizes the event if necessary, examines whether there are trends in the reports, and develops and implements potential solutions. Each meeting lasts an average of one to two hours, but committee members spend an average of two to four hours prior to the meeting manually reviewing events, categorizing, and identifying trends. Our goal is to develop a more efficient method for categorizing patient safety event reports to reduce the time investment of the committee. To do this, two data analytics experts (AF and RR), who have worked extensively with patient safety event data, joined the committee to learn about their classification process, develop natural language processing (NLP) algorithms, and work with the committee to validate and implement the algorithms [10–12]. Our focus was on medication events because these events are frequently reported, pose tremendous risk to patients, and require extensive time to review by the pharmacist and committee relative to other event types.

3.3. Approach We developed classification models for each of the four categories (pharmacy delivery delays, pharmacy dispensing errors, prescriber errors, and Pyxis discrepancy errors) in Table 1. This was done by using the identified events for the error type being modeled as positive cases and using all the remaining reports as negative cases, including the other three error types and the 399 “other” categories. As an example, for the prescriber error model the 108 prescriber error reports served as positive instances and the remaining reports (587) served as negative prescriber error reports for training and testing of the prescriber error model. For each category, we first set aside 20% of the annotated reports for testing, Fig. 1. For each category, 20% of the test set was randomly selected from the corresponding positive instances of the respective category and the remaining 80% of the test set was randomly selected from the respective negative instances. This semi-random approach was to ensure that the proportion of positive reports in the training sets were the same as in the test sets. For prescriber error, 87 positive instances and 470 negative instances were used for all training models and 21 positive instances and 117 negative instances were used for testing all models. For pharmacy dispensing error, 55 positive and 502 negative instances were used for training and 13 positive and 125 negative instances were used for testing. For Pyxis discrepancy, 52 positive and 505 negative instances were used for

2.3. Natural language processing Natural language processing (NLP) techniques have been previously used to explore and mine patient safety event reports. Examples include identifying latent themes and topics in reports, serious safety events, and health information technology related events [11,13,14]. Various statistical methods, each with different advantages and limitations, have been used to train and classify text [13]. However, previous 2

International Journal of Medical Informatics xxx (xxxx) xxx–xxx

A. Fong et al.

Fig. 1. Overview of the algorithm development and testing approach. The number of positive and negative prescriber error instances used for training and testing are provided as an example. Each modeling condition uses the same training data for each safety category.

Precision is defined as the number of true positives divided by the sum of true and false positives. Accuracy is defined as the sum of true positives and true negatives divided by the total population size of the tested data. Our approach is summarized in Fig. 1.

training and 12 positive and 126 negative instances were used for testing. For pharmacy delivery delay, 45 positive and 512 negative instances were used for training and 11 positive and 127 negative instances were used for testing. We trained and evaluated three different techniques, support vector machine (SVM), decision tree (DT), and cosine similarity (COS), for each category [16,17]. There are two text components that could be used for model development. The brief factual description which is an input from the event reporter and the resolution which is input by a manager to provide more context around the event. We evaluated two different text inclusion conditions to determine how these text components contribute to model performance: use only the brief free-text [B] or concatenate the brief and resolution free-text [B + R]. In both approaches, the same preprocessing was applied to generate feature vectors for training. The text was first tokenized into unigrams and then stemmed. Numbers, punctuations, and common English stopwords as well as domain specific stopwords (such as “nurse” and “patient”) were removed. Lastly, to reduce dimensionality, sparse terms which were defined as terms occurring in less than 1% of the documents, were removed. For each category, we trained six models using the remaining unigrams as the feature vectors a in bag-of-words approach. Each of these models’ precision, accuracy, and receiver operating characteristic area under the curve (ROC AUC) were evaluated using the reserved testing data.

3.4. Model development We developed SVM with Radial Basis Function (RBF) kernel models using 10 cross-fold validation on the training data. We generated DTs with the training data, iterating and pruning until a precision and accuracy of 85% was reached. Lastly, we evaluated a COS approach by utilizing two feature vectors, one from the target reports and the other from the non-target reports. We then calculated the cosine distance between the two feature vectors and classified the report based on their distances from the vectors. 3.5. Medication PSE visualization We deployed the best performing models in a dashboard visualization. The classification algorithm is applied to new reports which can then be regularly updated. Because there is imprecision in the algorithms, we use a semi-supervised strategy to incorporate user feedback to provide corrections. The corrections provided will be

Table 1 Brief description of annotation categories and their frequency of occurrence in the dataset. Annotated categories (count)

Brief description

Pharmacy Delivery Delay (56) Pharmacy Dispensing Error (68) Prescriber Error (108)

Errors that involve the delivery of medications late or after the time needed to administer Errors related to pharmacy dispensing incorrect medications, wrong dose, or wrong formulation Errors stemming from a prescriber error; mainly includes medication dosing errors but can also include prescribing wrong medication, wrong route, wrong indication, etc. Errors that involve the incorrect returning of medication to Pyxis

Pyxis Discrepancy (64)

3

International Journal of Medical Informatics xxx (xxxx) xxx–xxx

A. Fong et al.

Table 2 Accuracy, precision, F1-score, sensitivity, specificity, and ROC AUC of each medication category modeled under both the brief text and the brief + resolution text conditions (gray highlight models with highest AUC). *Low precision and sensitivity results of Pharmacy Dispensing Errors SVM using brief text caused by the model categorizing all reports as negative.

Dispensing Error were the least similar, Fig. 2.

updated in near real-time and will also be used to retrain the algorithm. 4. Results

4.3. Application to medication PSE dashboard 4.1. Performance Lastly, the models with the highest AUC were integrated into a dashboard using RShiny [20]. The dashboard, Fig. 3, provides users an overview of the events by categories over time with an option to select different date ranges. Importantly users can update model results by identifying incorrectly categorized reports.

Accuracy, precision, F1-score, sensitivity, specificity, and ROC AUC metrics for each approach are summarized in Table 2. ROC AUC was used to rank the relative performance of the models as this metrics combines both sensitivity and specificity. Pyxis Discrepancy and Pharmacy Delivery Delay SVM models had high and similar AUC results between the [B] and [B + R] conditions. While there are no previous studies to compare these prediction of medication workflow related patient safety events, the Pyxis Discrepancy and Pharmacy Delivery Delay AUC results are comparable to previous patient safety event health information technology modeling work [18,19]. COS models had higher AUC results for Prescriber Error and Pharmacy Dispensing Error reports. Furthermore, models under the [B] condition had the highest AUC results across all categories.

5. Discussion Integrating data analytic and safety science expertise with the clinical safety committee to streamline the analysis and categorization of patient safety events has led to promising results. The clinicians provided the data analytics experts with the necessary domain specific knowledge to develop NLP techniques to recategorize medication patient safety events into specific workflow related categories. The models can serve to dramatically reduce the time investment currently required by the review committee. By integrating these models into an interactive visualization the clinical staff is able to gain insights in a more timely fashion and the clinicians can provide feedback and corrections to update the model results, especially in situations of poor precision (e.g. pharmacy dispensing errors).

4.2. Brief and resolution free-text To further investigate these results, we compared the brief text to the resolution text for each category. While on average briefs are longer than resolutions, Prescriber Error resolutions tended to be longer than briefs. In addition, calculating the cosine distance between valid brief and resolution text vectors shows that Prescriber Error and Pharmacy

Fig. 2. Cosine similarities between brief and resolution text by categories.

4

International Journal of Medical Informatics xxx (xxxx) xxx–xxx

A. Fong et al.

Fig. 3. Medication safety event dashboard prototype.

model results.

5.1. Brief and resolution free-text The AUC results suggest that the modeling the brief free-text alone could generate sufficient models. Excluding the resolution free-text from reports can reduce extraneous information as well as reduce processing time.

Contributorship statement According to the definition given by the International Committee of Medical Journal Editors (ICMJE), Allan Fong qualifies for authorship including making substantial contributions to the intellectual content of conception and design, acquisition of data, and analysis and interpretation of data. Furthermore, Allan Fong has participated in drafting of the manuscript and critical revision of the manuscript for important intellectual content. Allan Fong is the corresponding author. According to the definition given by the International Committee of Medical Journal Editors (ICMJE), Nicole Harriott qualifies for authorship including making substantial contributions to the intellectual content of analysis, acquisition of data, and interpretation of data. Furthermore, Nicole Harriott has participated in drafting of the manuscript and critical revision of the manuscript for important intellectual content. According to the definition given by the International Committee of Medical Journal Editors (ICMJE), Donna M. Walters qualifies for authorship including making substantial contributions to the intellectual content of conception and design, acquisition of data, and interpretation of data. Furthermore, Donna M. Walters has participated in drafting of the manuscript and critical revision of the manuscript for important intellectual content. According to the definition given by the International Committee of Medical Journal Editors (ICMJE), Hanan Foley qualify for authorship including making substantial contributions to the intellectual content of conception and design and interpretation of data. Furthermore, Hanan Foley has participated in drafting of the manuscript and critical revision of the manuscript for important intellectual content. According to the definition given by the International Committee of Medical Journal Editors (ICMJE), Richard Morrissey qualify for authorship including making substantial contributions to the intellectual content of conception and design and interpretation of data. Furthermore, Richard Morrissey has participated in drafting of the manuscript and critical revision of the manuscript for important intellectual content.

5.2. Limitations and future work Medical ontologies, such as SNOMED, were not used in this analysis. These ontologies tend to be centered more around medical concepts and less applicable to patient safety concepts (e.g. communication, handoffs, and teamwork). However, it would be interesting to investigate if adapting such ontologies can increase predictive power of these models. Understanding these limitations, the dashboard was built with the option for users to provide feedback to the model when there are incorrectly categorized reports. Determining when new domain knowledge or context should be incorporated into a model is difficult. However, we believe that this work provides a step towards incorporating an active learning feedback mechanism to monitor a shift in context when discussing safety events. This highlights the importance of integrating computational expertise and clinical expertise to analyze safety event data. Through this iterative process of feedback and model refinement, the underlying model performance should improve. Furthermore, these models were developed using reports from one hospital. Workflow, culture, and policies can vary between hospitals which can limit the generalizability of our models. Testing and validating these models on reports from other hospitals and hospital systems will be an important next step. 6. Conclusion We evaluated different NLP modeling techniques and text inclusion strategies to categorize four specific medication workflow safety events. We demonstrated the predictive capabilities of these models while highlighting the cautionary benefits with using additional resolution text in the model. Lastly, the models were incorporated into an interactive visualization that provide users a way to directly update 5

International Journal of Medical Informatics xxx (xxxx) xxx–xxx

A. Fong et al.

events in hospitalized adults, J. Gen. Intern. Med. 8 (1993) 289–294. [2] A. Malpass, S.C. Helps, E.J. Sexton, et al., A classification for adverse drug events, J. Qual. Clin. Pract. 19 (1999) 23–26. [3] L.L. Leape, T.A. Brennan, N.M. Laird, et al., The nature of adverse events in hospitalized patients. Results of the Harvard Medical Practice Study II, N. Engl. J. Med. 324 (1991) 377–384. [4] P. Aspden, J.W. Corrigan, S.M. Erickson, Patient safety reporting systems and applications, Patient Safety: Achieving a New Standard of Care, National Academy Press, Washington, D.C, 2004, pp. 250–278. [5] J. Rosenthal, M. Booth, Maximizing the Use of State Adverse Event Data to Improve Patient Safety, National Academy for State Health Policy, Portlan, ME, 2005 Retrieved Jan 15, 2016, from http://www.nashp.org/Files/Patient%5FSafety %5FGNL61%5Ffor%5Fweb.pdf. [6] P.J. Pronovost, D.A. Thompson, C.G. Holzmueller, et al., Toward learning from patient safety reporting systems, J. Clin. Nurs. 21 (2006) 305–315 http://www. ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt= Citation&list_uids=17175416. [7] A.W. Wu, P.J. Pronovost, L. Morlock, ICU incident reporting systems, J. Crit. Care 17 (2002) 86–94 http://www.ncbi.nlm.nih.gov/pubmed/12096371. [8] D.R. Longo, J.E. Hewett, B. Ge, et al., The long road to patient safety: a status report on patient safety systems, JAMA 294 (2005) 2858–2865 http://www.ncbi.nlm.nih. gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids= 16352793. [9] J. White, Adverse Event Reporting and Learning Systems: A Review of the Relevant Literature, The Canadian Patient Safety Institute, 2007. [10] A. Fong, A.Z. Hettinger, R.M. Ratwani, Exploring methods for identifying related patient safety events using structured and unstructured data, J. Biomed. Inf. 58 (2015) 89–95, http://dx.doi.org/10.1016/j.jbi.2015.09.011. [11] A. Fong, R.M. Ratwani, An evaluation of patient safety event report categories using unsupervised topic modeling, Methods Inf. Med. 54.4 (2015) 338–345. [12] R. Ratwani, A. Fong, ‘Connecting the dots’: leveraging visual analytics to make sense of patient safety event reports, J. Am. Med. Inf. Assoc. 22.2 (2014) 312–317. [13] M.-S. Ong, F. Magrabi, E. Coiera, Automated identification of extreme-risk events in clinical incident reports, J. Am. Med. Inf. Assoc. 19 (2012) e110–8, http://dx.doi. org/10.1136/amiajnl-2011-000562. [14] K.E.K. Chai, S. Anthony, E. Coiera, et al., Using statistical text classification to identify health information technology incidents, J. Am. Med. Inf. Assoc. 6 (2013) 1–6, http://dx.doi.org/10.1136/amiajnl-2012-001409. [15] M.-S. Ong, F. Magrabi, E. Coiera, Automated categorisation of clinical incident reports using statistical text classification, Qual. Saf. Health Care 19 (2010) e55, http://dx.doi.org/10.1136/qshc.2009.036657. [16] A. Liaw, M. Wiener, Classification and Regression by randomForest 2 R News, 2002, pp. 18–22 http://cran.r-project.org/doc/Rnews/. [17] E. Dimitriadou, K. Hornik, F. Leisch, et al. e1071: Miscellaneous Functions of the Department of Statistics. R Packag version 16 2011. [18] M.-S. Ong, F. Magrabi, E. Coiera, Automated categorisation of clinical incident reports using statistical text classification, Qual. Saf. Health Care 19 (2010) e55, http://dx.doi.org/10.1136/qshc.2009.036657. [19] K.E.K. Chai, S. Anthony, E. Coiera, et al., Using statistical text classification to identify health information technology incidents, J. Am. Med. Inf. Assoc. 20 (2013) 980–985, http://dx.doi.org/10.1136/amiajnl-2012-001409. [20] Web Application Framework for R, 2016. http://shiny.rstudio.com/.

According to the definition given by the International Committee of Medical Journal Editors (ICMJE), Raj Ratwani qualify for authorship including making substantial contributions to the intellectual content of conception and design and interpretation of data. Furthermore, Raj Ratwani has participated in drafting of the manuscript and critical revision of the manuscript for important intellectual content. Statement on conflict of interest The authors have no competing interests or conflicts of interest. Funding statement N/A. Summary table What was known

• Extracting information from patient safety reports is challenging in large part due to the variability in reporting • NLP techniques can assist in understanding free-text Added knowledge

• Considering •

only the brief descriptions of patient safety reports was generally sufficient for developing reliable classification models Integration of models into visualization requires mechanism for users to provide feedback to the models

Acknowledgements We are very thankful to the entire review committee and the dedication of the frontline reporters working to make our hospital and systems safer. References [1] D.W. Bates, L.L. Leape, S. Petrycki, Incidence and preventability of adverse drug

6