Journal Pre-proof A systematic review of Natural Language Processing for classification tasks in the field of incident reporting and adverse event analysis Ian James Bruce Young, Saturnino Luz, Nazir Lone
PII:
S1386-5056(19)30237-0
DOI:
https://doi.org/10.1016/j.ijmedinf.2019.103971
Reference:
IJB 103971
To appear in:
International Journal of Medical Informatics
Received Date:
2 March 2019
Revised Date:
6 August 2019
Accepted Date:
14 September 2019
Please cite this article as: Young IJB, Luz S, Lone N, A systematic review of Natural Language Processing for classification tasks in the field of incident reporting and adverse event analysis, International Journal of Medical Informatics (2019), doi: https://doi.org/10.1016/j.ijmedinf.2019.103971
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier.
Title:A systematic review of Natural Language Processing for classification tasks in the field of incident reporting and adverse event analysis. Authors:Ian James Bruce Younga, Saturnino Luzb, Nazir Lonec a
Department of Anaesthesia, Critical Care and Pain Medicine, Edinburgh Royal Infirmary, 51 Little France Crescent, Edinburgh, Scotland, EH16 4SA.
[email protected]. b
ro of
Usher Institute of Population Health Sciences & Informatics, The University of Edinburgh, 9 Little France Rd, Edinburgh, Scotland EH16 4UX.
[email protected] c
Usher Institute of Population Health Sciences and Informatics, The University of Edinburgh, Teviot Place, Edinburgh, EH8 9AG.
[email protected]
Corresponding Author
-p
Ian James Bruce Young Highlights
NLP can generate meaningful information from healthcare incident reports.
In classification tasks, NLP can perform well compared to manual annotation.
No single NLP technique shows superiority in this domain.
NLP has the potential to improve learning from adverse events in healthcare.
na
ABSTRACT
lP
re
ur
Context: Adverse events in healthcare are often collated in incident reports which contain unstructured free text. Learning from these events may improve patient safety. Natural language processing (NLP) uses computational techniques to interrogate free text, reducing
Jo
the human workload associated with its analysis. There is growing interest in applying NLP to patient safety, but the evidence in the field has not been summarised and evaluated to date.
Objective: To perform a systematic literature review and narrative synthesis to describe and evaluate NLP methods for classification of incident reports and adverse events in healthcare.
Methods: Data sources included Medline, Embase, The Cochrane Library, CINAHL, MIDIRS, ISI Web of Science, SciELO, Google Scholar, PROSPERO, hand searching of key articles, and OpenGrey. Data items were manually abstracted to a standardised extraction form.
Results: From 428 articles screened for eligibility, 35 met the inclusion criteria of using NLP to perform a classification task on incident reports, or with the aim of detecting adverse events. The majority of studies used free text from incident reporting systems or electronic health records. Models were typically designed to classify by type of incident, type of
ro of
medication error, or harm severity. A broad range of NLP techniques are demonstrated to perform these classification tasks with favourable performance outcomes. There are
methodological challenges in how these results can be interpreted in a broader context.
Conclusion: NLP can generate meaningful information from unstructured data in the specific
-p
domain of the classification of incident reports and adverse events. Understanding what or why incidents are occurring is important in adverse event analysis. If NLP enables these
re
insights to be drawn from larger datasets it may improve the learning from adverse events in
lP
healthcare. Abbreviations
na
Adverse Drug Event ADE Electronic Health Record EHR Area under receiver operating characteristic curves AUC Support Vector Machines SVM Keywords
Jo
ur
Natural language processing Machine learning Text classification Incident reporting Adverse event analysis Patient Safety
1.0 INTRODUCTION
1.1 RATIONALE Incident reports are tools to collect data about adverse events and errors in healthcare[1]. Their use in healthcare has been brought over from other high reliability industries which
have recognised the importance of reporting potential and actual harm for improving safety[2]. A culture which promotes the reporting and analysis of incidents, errors, and adverse events is now thought a central tenet of patient safety[3,4].
Ultimately the utility of this system is predicated on the reporting of incidents reducing the risk to future patients. This could be either by understanding what incidents are occurring, or why incidents are occurring, and then taking actions based on this understanding. In the investigation and analysis of incident reports, one component is classification[2][5]. This may be classification of incident type, of the type and severity of harm that resulted, or of the
ro of
factors that contributed to the incident occurring.
There are problems with the current work flow for processing incident reports that make it
difficult to translate reports into better outcomes[4]. Firstly, the system is neither reliable nor robust. It does not give the same consideration to all reports, and it is often unclear what
-p
factors determine the review course of a particular report.
re
Secondly, issues of data validity within incident reports make analysis harder. Proprietary incident reporting systems typically record a combination of structured data entry fields and
lP
free text responses[6]. Free text responses provide the initiator of the report with freedom to describe the incident as they saw it, but the completeness and accuracy of reporting can limit data validity. Structured responses do not necessarily increase data validity. Classification of
na
incident type is often entered by the initiator of the report and chosen from a structured list of options. This may worsen data validity if the structured choices do not allow the initiator to adequately summarise the incident, or if they are missing important contextual information to
ur
inform the classification[7].
Jo
Lastly, incident reports are produced with a volume and velocity that makes thorough and timely human review impossible[8][9][10]. Through the efficiency of automation, data science solutions may be beneficial where data volume and velocity are problems to overcome.
Within data science, natural language processing (NLP) is a field which tries to understand, process, and interpret human language[11]. It would be a monumental task to stay on top of all the clinical free text loosed on the world every day. NLP aims to create structure from this
unstructured data. These structured data then provide a substrate to train Machine Learning Models to analyse the text As such, set in the context of other methods for analysing incident reports, NLP may confer benefit through allowing all incident reports to be processed in a reliable way, and in dramatically reducing the time associated with analysis.
In the last decade, NLP has demonstrated potential utility in healthcare data beyond the field of clinical incident reporting and adverse event analysis. Using free text radiology reports, NLP has been used to automate the detection of Venous Thromboembolism diagnoses[12][13][14], malignancy diagnoses[15][16] and detect critical follow-up
ro of
recommendations[17].
In the domain of preventing adverse drug events (ADEs), NLP has been used to provide
patient individualised ADE information[18], detect and provide real-time clinician feedback of drug errors in a neonatal intensive care unit [19][20], and look for incidences of known
-p
drug side effects within EHR data[21].
re
NLP may have utility as a method for detecting clinically important outcomes, in contrast to traditional methods such as manual chart review or diagnostic coding. NLP has been used on
lP
EHR data to identify hypoglycaemic episodes[22], inpatient falls[23], healthcare associated urinary tract infections[24], and cancer recurrence[25].
na
The potential of NLP as a clinical predictive tool has also been explored. NLP has been used to predict clinical complications for cancer patients using EHR data[26].
ur
1.2 OBJECTIVES
There is growing interest in applying NLP to patient safety but the evidence in the field has
Jo
not been summarised and evaluated to date. For this reason, we conducted a systematic review and narrative synthesis to understand the published work on using natural language processing for classification tasks in the field of incident reports and adverse event analysis. Our specific objectives were to understand what NLP has been shown to achieve, the techniques employed, and to highlight areas of future research in this field. PRISMA guidelines were followed in creating this review.
2.0 METHODS
2.1 ELIGIBILITY CRITERIA We followed international Preferred Reporting Items for Systematic Reviews and MetaAnalyses (PRISMA) guidelines in conducting this review. Study characteristics for eligibility were: published research, of experimental or methodological studies, reviews and conference abstracts, published after 2004, written in English language, using NLP techniques on free text for the purposes of classification. Studies which trained machine learning models on structured text fields were excluded. The application of NLP techniques should be in the field of human healthcare. The source of free text should be incident reporting systems or if not,
ro of
the classification should relate to detection of adverse events or medical error.
2.2 INFORMATION SOURCES
Articles were identified for this review through a search of the online databases; Medline via Ovid, Embase via Ovid, The Cochrane Library, CINAHL, MIDIRS, ISI Web of Science,
-p
SciELO, Google Scholar, and PROSPERO. A pearling strategy was applied to bibliographies of key articles. A grey literature search was conducted via OpenGrey. The search was last
re
conducted on May 8th, 2018.
lP
2.3 SEARCH
The full electronic search strategy, including limits used, is presented in appendix A.
na
2.4 STUDY SELECTION
Studies identified through electronic search were initially de-duplicated in their online databases before being stored in Mendeley reference manager (Mendeley Ltd, version 1.19.2,
ur
2018)[27]. Mendeley’s desktop application was used to de-duplicate the combined search output. Initial screening was at the level of title and abstract. Full text review was then
Jo
conducted for remaining articles. University of Edinburgh library requests were submitted for those studies whose full text could not be accessed initially. Included studies were those that: (a) used NLP (b) to perform a classification task (c) either of incident reports, or other source of clinical free text where the aim of classification was to detect adverse events or medical errors.
2.5 DATA COLLECTION PROCESS
Data extraction from included studies was performed within Mendeley’s desktop application and stored in an Excel spreadsheet (Microsoft, version 16.21.1, 2018). Data extraction was performed independently, using presented data. Rejected studies were classified under a standard set of explanations and are fully detailed in the PRISMA flow diagram in figure 1.
2.6 DATA ITEMS The variables for which data was sought were mapped to the PICOS statement, stored in a data extraction form, and are described below: Participants – Study title as study ID. From what dataset was the free text extracted?
ro of
Interventions – What classification task was being performed? What type or types of NLP were being used?
Comparisons – What alternative non-automated classification technique was used as a comparator to the classifier? Study design – What was the type of study?
re
2.7 SUMMARY MEASURES
-p
Outcomes – What statistical analysis of classifier performance was performed?
extracted data items.
na
3.0 RESULTS
lP
The summary is a narrative synthesis of the eligible studies, based principally on the
3.1 STUDY SELECTION
ur
The number of studies screened, assessed for eligibility, and included in the review are presented in a PRISMA flow diagram in figure 1. This also details the number of studies
Jo
rejected at each stage with accompanying reasons for rejection.
ur
na
lP
re
-p
ro of
Figure 1: PRISMA 2009 Flow Diagram
3.2 STUDY CHARACTERISTICS Data were sought for the variables mapped to the PICOS statement as described above. A
Jo
summary of the data extraction form is presented in figure 2. The full citation list for studies included in the review is presented in the bibliography under references[7,11,36–45,28,46– 55,29,56–60,30–35].
3.3 NARRATIVE SYNTHESIS OF RESULTS Thirty-five studies were included in this review. Twenty-one studies used free text extracted from incident reporting systems[7,11,46,47,49–55,58,28,59,29,30,33,36,37,40,42], nine from
EHRs[34,35,38,43–45,48,56,61], three from ADE reporting systems[39,57,60], one from morbidity and mortality records[41], and one from discharge summaries[31].
3.3.1 Type of classification A range of classification tasks were performed. The total number described exceeded 35 as some studies performed more than one type of classification task. Twenty studies developed NLP classifiers for “type of incident”[28,29,49,50,52,53,55,56,58,59,61,30,31,36,38,40– 42,44], seven for “type of medication error”[35,39,45,47,54,57,60], six for “severity or presence of harm”[7,11,33,46,51,59], three for “type of postoperative
ro of
complication”[34,43,48], and one for “type of contributory factor”[7].
Studies using NLP for the classification of incident types have taken various approaches.
Some have defined a single incident type and modelled various NLP techniques to optimise this binary classification[38,42,52,61]. Others have imposed a predefined ontology of
-p
incident types and developed either multi-label text classification[7] or multiple binary
classifiers[30,31]. Network analysis has also been used to identify incident types from free
re
text rather than imposing a known ontology[49].
lP
Of the studies which focused on ADEs, four developed classifiers to identify the presence of generalised medication events[39][47][60][45], while three looked specifically at identifying
na
bleeding events[35], anaphylaxis [57], and “look-alike sound-alike” errors[54].
3.3.2 Classification performance
With respect to reporting performance outcomes for NLP models, studies consistently utilised
ur
measures that can be calculated from an error matrix[62]. The most commonly reported performance metrics were Sensitivity (Recall), Positive Predictive Value (Precision),
Jo
Accuracy, and F Score (Harmonic mean of Precision and Recall). Multiple studies also presented error matrix outcomes as area under receiver operating characteristic curves (AUC). There was however no overall consistency as to which specific measures were reported.
3.3.3 Machine Learning Models Most studies reported outcomes for more than one NLP technique. The most frequent models developed used variants of Machine Learning Models, including Support Vector Machines
(SVM) (20 studies), Naïve Bayes (11 studies), Logistic Regression (7 studies), and K-Nearest Neighbours (5 studies). Decision Trees and Random Forests were both used in 4 studies. There were then a number of Machine Learning techniques that appear in 2 or fewer studies; Topic Modelling, Decision Rules, Neural Networks, Boosted Trees, and Active Learning. While many studies developed their own NLP models, several used proprietary NLP software such as MedLee, or SAS text Miner[63][64]. Five or ten fold cross-validation was used either to split data into training, validation and testing sets, or to optimise model parameters in 10 of the 35 studies[7,11,33,39,40,44,47,48,54,59].
ro of
The vast majority of studies used a manually annotated corpus of free text documents as the comparator to their NLP model. Broadly, studies managed to develop an NLP classifier whose performance approached that of the comparator. Fong et al demonstrated AUC
performance of 0.96 using a SVM classifier to identify ADE in incident reports[47]. Ong et al demonstrated AUC performance of 0.97 using a Naïve Bayes classifier to identify patient
-p
identification and handover events in incident reports[29].
Jo
ur
na
lP
re
Figure 2: Summary of included studies
ro of -p re lP
Figure 2 Legend:
Incident Reporting System (IRS), Electronic Health Record (EHR), Morbidity & Mortality Record (M&M), Adverse Drug Event Reporting System (ADE), Discharge Summary (DIS) Incident Type (IT), Medication Error Type (MET), Harm Severity (HS), Contributory Factors (CF), Postoperative Complication (POC)
na
Topic Modelling (TM), Proprietary Software (PS), Neural Networks (NN), Decision Trees (DT), Logistic Regression (LR), K-Nearest Neighbours (K-NN), Naïve Bayes (NB), Support Vector Machines (SVM), Random Forests (RF) Manually Annotated Corpus (MAC), Initial Reporter Classification (IRC)
ur
Area Under Receiver Operating Characteristic Curves (AUROC), Confusion Matrix (e.g. Sensitivity, Specificity, Positive Predictive Value, Negative Predictive Value, Accuracy, Precision, Recall, F-Score) (Confusion Matrix)
Jo
3.3.4 Quality Assessment
A critical evaluation of study quality was conducted using the TRIPOD reporting guidelines as a framework[65]. Broadly, studies clearly identified the data source and the nature of the classification task. Datasets consisted of a mixture of free text and structured data entry fields. These were extracted, in all cases, from internal databases affiliated either to a hospital or public institution. The number of records extracted for use ranged from five to over 20 million, with a median of 2974[44,45]. Studies were clear that NLP was being used to develop rather than validate predictive models. Studies were clear about the method for
internal validation, which was typically a manually annotated corpus. Studies often described multiple NLP models and performance metrics. It was typically not clear which of these had been decided a priori, and whether actions were taken to blind the assessment of individual models. The specifics of NLP model development were not made clear in all cases. Classification performance was reported heterogeneously amongst the included studies. Studies discussed limitations and provided an overall interpretation of their results and potential clinical applications of their models. Studies provided conflict of interest and funding statements.
ro of
4.0 DISCUSSION
4.1 SUMMARY OF EVIDENCE
There are now a number of studies demonstrating that NLP models can be developed to classify the unstructured free text contained in incident reports and EHRs according to
-p
incident type and the severity of harm associated. Published work has explored binary
classification techniques more widely than multi-labelled classification problems. The type of
re
NLP that has been found to perform best has varied between datasets and classification tasks. A wide variety of model performance metrics are reported, reflecting different priorities in
lP
the use of the model. In general, studies have developed NLP models which can perform classification tasks in this domain with performance outcomes which approach manual
na
human classifiers.
4.2 LIMITATIONS
In conducting this systematic review, resource limitations did not allow for the search to be
ur
performed in duplicate with two independent reviewers. We limited the search to English language articles due to lack of funding for translation facilities. We also limited the search to
Jo
articles published since 2005 to ensure relevance to current practice as the fields of adverse event analysis and NLP have evolved significantly over the past decade. As figure 3 shows, frequency of publications in this field appears to be increasing, with the majority of studies published in the last decade. As such this review will have captured most relevant publications. Syntactic and ontological differences between languages may limit the applicability of the NLP models used in this review to other languages, particularly in the text pre-processing techniques described[37][49].
ro of
Figure 3.
-p
Some aspects of the internal validity of these studies have been explored, such as the
difficulty in assessing the quality of comparative classification technique, and the effects of
re
multiple testing due to the publication of outcomes for multiple NLP models and classification tasks. Both of these factors could bias in favour of the NLP model performance.
lP
Across the range of studies, 16 of the 35 studies were published in two journals; Studies in Health Technology and Informatics, and Journal of the American Medical Informatics
of publication bias.
na
Association. Our search strategy included a grey literature search to minimise the possibility
At outcome level, a challenge in this review was how best to summarise the performance of
ur
NLP classifiers. In this review a narrative synthesis rather than a quantitative approach was chosen due to studies reporting outcomes for multiple binary classification and multiple NLP
Jo
models, a lack of assurance of data homogeneity, and a lack of a uniform outcome performance metric.
4.3 THE LACK OF DATA HOMOGENEITY Although the majority of studies used free text from proprietary incident reporting systems, this does not mean we can assume data homogeneity between these studies. It is recognised that the performance of NLP classifiers is very data dependent[61]. This is one explanation
for why a range of NLP models were found to perform best across studies in this review. Further it makes it difficult to infer which NLP model would demonstrate best classifier performance on a future data set.
4.4 THE USE OF MULTIPLE PERFORMANCE METRICS When reporting model performance, a metric should be chosen that best represents the association between model classification and "true" classification[66]. There is however acceptance that this relationship is complex and multifactorial. As such, there is an argument for reporting all performance metrics such that the most information possible is available for
ro of
those who might wish to develop the model further or for a different use case. Fong et al. and Ong et al. are good examples, presenting 6 and 5 performance metrics respectively[66][29].
It is known that efforts to improve one NLP model performance measure can detrimentally
affect another[66]. For example, increasing model sensitivity can decrease model precision. If
-p
the model is adjusted to make it more likely to predict a positive occurrence, there will be an increase in the number of positive occurrences that are recorded as positive, but also an
re
increase the number of negative occurrences incorrectly recorded as positive.
lP
Because of this, in model development one performance metric may have to be prioritised above another. Appropriate prioritisation depends on the intended use of the model[66].
na
In the domain of using NLP for classification of incidents and adverse events, it is important the model does not miss an important event, thus high sensitivity is important[66]. This could result in a decreased specificity due to an increased false positive rate. In this case, it is likely
ur
an important event would require some supplemental human confirmation, which could
Jo
manage the additional false positives.
4.4.1 The impact of multiple testing As most studies report performance outcomes for multiple NLP models, they are framed as methodological exploratory studies as much as experimental studies. The problem with interpreting the use of multiple models might be considered similarly to interpreting results from multiple testing[67].
4.5 COMPARING PERFORMANCE AGAINST MANUAL ANNOTATION
Model performance is typically reported as compared to a manually annotated corpus. This presumes manual annotation to be a gold standard. Thus, the validity of the accuracy measurements depends on the validity of the manual annotation. The use of multiple annotators and calculation of inter-rater agreement can improve the validity of manual annotation, but it has limitations. For example, inter-rater agreement would be unaffected if both raters missed a classification. Similarly, in most cases there is no way to be certain what proportion of outcomes are unclassified by both automated and manual systems[66].
5.0 FUTURE RESEARCH The majority of studies focus on binary classification, e.g. “drug error or no drug error”, “fall
ro of
or no fall”[54] [38]. It is recognised that incidents which lead to harm are often the result of multiple interacting factors[2]. Moving forwards, NLP interrogation of incident reports should look to achieve high performing multi-class models[7].
-p
Understanding why incidents occur may be more important for effecting change than
understanding what incidents have occurred. Further studies exploring the ability of NLP to
re
classify incident reports by contributory factors could offer more learning opportunities from
lP
adverse events.
Clinical free text represents a massive data set which has been largely underutilised because of its size, unstructured nature, and until recently, inability to be electronically
na
searched[68,69]. A wealth of new knowledge may be generated if computational techniques such as NLP can make these data suitable for analysis.
ur
6.0 CONCLUSIONS
This systematic review presents evidence that NLP can generate meaningful information
Jo
from unstructured data in the specific domain of the classification of incident reports and adverse events. Understanding what incidents are occurring or why they are occurring is important in adverse event analysis. NLP has the potential to allow such classification tasks to be performed at scale, for example between hospitals within a geographic region, between regions, or across an entire healthcare system. This has the potential to improve learning from adverse events in healthcare, which may ultimately reduce the risk to future patients.
One of the roles of data science in healthcare is to reduce the human burden of data acquisition and analysis. The hope, in doing so, is to give healthcare professionals the time to think creatively and effect change[70]. In a broader context, understanding how to interrogate this unstructured data offers opportunities in a range of healthcare settings. CONTRIBUTORSHIP STATEMENT Ian Young, Saturnino Luz, and Nazir Lone all qualify for authorship according to the International Committee of Medical Journal Editors (ICMJE). Each shares responsibility for the conception and design of this study, interpretation of this review, and drafting and critical revision of the manuscript. Ian Young is the corresponding author and is further responsible
ro of
for the acquisition of the data used in this review.
STATEMENT ON CONFLICT OF INTEREST
-p
The authors have no conflicts of interest to declare.
FUNDING
re
This work was supported by the department of Anaesthesia, Critical Care, and Pain Medicine
Anaesthesia Festival.
Jo
ur
na
SUMMARY TABLE
lP
at the Royal Infirmary of Edinburgh, via monies from the Trustees of the Edinburgh
What was already known
Analysis of incident reports and adverse events in healthcare is considered an important part of quality improvement and patient safety. NLP can provide structured information from unstructured free text, including performing classification tasks.
What this study adds
Within the domain of incident reporting and adverse event analysis, NLP has been shown to perform favourably compared to manual annotation in a wide range of classification tasks. Studies in this domain have focussed on binary classification of incident types. Exploring multi-class problems and contributory factor analysis of incident reports could have clinical utility. No single NLP technique shows superiority in this domain and training multiple models may be required to optimise classifier performance.
ro of
Jo
ur
na
lP
re
-p
Word Count: 3492
REFERENCES [1]
R. Lawton, R.R.C. McEachan, S.J. Giles et al. Development of an evidence-based framework of factors contributing to patient safety incidents in hospital settings: a systematic review, BMJ Qual. Saf. 21 (2012) 369–380.
[2]
S. Taylor-Adams, C. Vincent, D. Hewett et al. Systems Analysis of Clinical Incidents the London Protocol, (n.d.).
[3]
To Err Is Human, National Academies Press, Washington, D.C., 2000.
[4]
K.E. Wood, D.B. Nash, Mandatory State-Based Error-Reporting Systems: Current and Future Prospects, Am. J. Med. Qual. 20 (2005) 297–303. B. Peter, J. Pronovost, D.A. Thompson et al. Toward learning from patient safety reporting systems, (n.d.).
ro of
[5]
[6]
DATIX, No Title, (2018). https://www.datix.co.uk/en/.
[7]
C. Liang, Y. Gong, Automated Classification of Multi-Labeled Patient Safety Reports: A Shift from Quantity to Quality Measure., Stud. Health Technol. Inform. 245 (2017)
[8]
-p
1070–1074.
M. Govindan, Automated detection of harm in healthcare with information technology:
[9]
re
a systematic review, Qual. Saf. Health Care. 19 (2010) e11.
D.S. Carrell, R.E. Schoen, D.A. Leffler et al. Challenges in adapting existing clinical
lP
natural language processing systems to multiple, diverse health care settings, J. Am. Med. Informatics Assoc. 24 (2017) 986–991. [10] R. Pivovarov, N. Mie Elhadad, Automated methods for the summarization of
na
electronic health records, (n.d.).
[11] R. Jacobsson, Extraction of adverse event severity information from clinical narratives using natural language processing, Pharmacoepidemiol. Drug Saf. 26 (2017) 37.
ur
[12] C.M. Rochefort, A.D. Verma, T. Eguale et al. A novel method of adverse event detection can accurately identify venous thromboembolisms (VTEs) from narrative
Jo
electronic health record data, (2014).
[13] Z. Tian, S. Sun, T. Equale et al. Automated extraction of vte events from narrative radiology reports in electronic health records: A validation study, Med. Care. 55 (2017) e73–e80. [14] J.W. Galvez, J.M. Pappas, L. Ahumada et al. The use of natural language processing on pediatric diagnostic radiology reports in the electronic health record to identify deep venous thrombosis in children, J. Thromb. Thrombolysis. 44 (2017) 281–290. [15] W. Yim, S.W. Kwan, M. Yetisgen, Classifying tumor event attributes in radiology
reports, J. Assoc. Inf. Sci. Technol. 68 (2017) 2662–2674. [16] C.R. Moore, A. Farrag, E. Ashkin, Using Natural Language Processing to Extract Abnormal Results From Cancer Screening Reports., J. Patient Saf. 13 (2017) 138–143. [17] M. Yetisgen-Yildiz, M.L. Gunn, F. Xia et al. Automatic identification of critical follow-up recommendation sentences in radiology reports, AMIA Annu. Symp. Proc. 2011 (2011) 1593–1602. [18] J.D. Duke, ADESSA: A Real-Time Decision Support Service for Delivery of Semantically Coded Adverse Drug Event Data, AMIA Annu. Symp. Proc. 2010 (2010) 177–181.
ro of
[19] Q. Li, E.S. Kirkendall, E.S. Hall et al. Automated detection of medication administration errors in neonatal intensive care, Journal of Biomedical Informatics. 57 (2015) 124-133.
[20] Y. Ni, T. Lingren, E.S. Hall et al. Designing and evaluating an automated system for
Am. Med. Inform. Assoc. 25 (2018) 555–563.
-p
real-time medication administration error detection in a neonatal intensive care unit., J.
[21] T. Cai, Natural language processing to rapidly identify potential signals for adverse
re
events using electronic medical record data: Example of arthralgias and vedolizumab, Arthritis Rheumatol. 68 (2016) 2802–2804.
lP
[22] A.P. Nunes, J. Yang, L. Radican et al. Assessing occurrence of hypoglycemia and its severity from electronic health records of patients with type 2 diabetes mellitus, Diabetes Res. Clin. Pract. 121 (2016) 192–203.
na
[23] S. Toyabe, Characteristics of Inpatient Falls not Reported in an Incident Reporting System, Glob. J. Health Sci. 8 (2015) 17–25. [24] H. Tanushi, Detection of healthcare-associated urinary tract infection in Swedish
ur
electronic health records, Stud. Health Technol. Inform. 207 (2014) 330–339. [25] D.S. Carrell, S. Halgrim, D.-T. Tran et al. Practice of Epidemiology Using Natural
Jo
Language Processing to Improve Efficiency of Manual Chart Abstraction in Research: The Case of Breast Cancer Recurrence, (n.d.).
[26] K. Jensen, C. Soguero-Ruiz, K. Oyvind Mikalsen et al. Analysis of free text in electronic health records for identification of cancer patient trajectories, Sci. Rep. 7 (2017) 46226. [27] MENDELEY, No Title, (n.d.). https://www.mendeley.com (accessed June 15, 2018). [28] F. A, An Evaluation of Patient Safety Event Report Categories Using Unsupervised Topic Modeling, Methods Inf. Med. 54 (2015) 338–345.
[29] M.-S. Ong, F. Magrabi, E. Coiera, Automated categorisation of clinical incident reports using statistical text classification, Qual. Saf. Health Care. 19 (2010) e55. [30] J. Gupta, I. Koprinska, J. Patrick, Automated Classification of Clinical Incident Types, Stud. Health Technol. Inform. 214 (2015) 87–93. [31] G.B. Melton, G. Hripcsak, Automated detection of adverse events using natural language processing of discharge summaries, J. Am. Med. Informatics Assoc. 12 (2005) 448–457. [32] J.F.E. Penz, A.B. Wilcox, J.F. Hurdle, Automated identification of adverse events related to central venous catheters, J. Biomed. Inform. 40 (2007) 174–182.
ro of
[33] M.-S. Ong, F. Magrabi, E. Coiera, Automated identification of extreme-risk events in clinical incident reports, J. Am. Med. Informatics Assoc. 19 (2012) e110–e118. [34] H.J. Murff, F. FitzHenry, M.E. Matheny et al. Automated Identification of
Postoperative Complications Within an Electronic Medical Record Using Natural Language Processing, Jama. 306 (2011) 848–855.
-p
[35] R.D. Boyce, J. Jao, T. Miller et al. Automated Screening of Emergency Department Notes for Drug-Associated Bleeding Adverse Events Occurring in Older Adults.,
re
Appl. Clin. Inform. 8 (2017) 1022–1030.
[36] G. J, Automated validation of patient safety clinical incident classification: Macro
lP
analysis, Stud. Health Technol. Inform. 188 (2013) 52–57. [37] K. Fujita, M. Akiyama, N. Toyama et al. Detecting effective classes of medical incident reports based on linguistic analysis for common reporting system in Japan,
na
Stud. Health Technol. Inform. 192 (2013) 137–141. [38] S. Toyabe, Detecting inpatient falls by using natural language processing of electronic medical records, BMC Health Serv. Res. 12 (2012) 448.
ur
[39] L. Han, R. Ball, C.A. Pamer et al. Development of an automated assessment tool for MedWatch reports in the FDA adverse event reporting system, J. Am. Med.
Jo
Informatics Assoc. 24 (2017) 913–920.
[40] A.L. Benin, S.J. Fodeh, K. Lee et al. Electronic approaches to making sense of the text in the adverse event reporting system, J. Healthc. Risk Manag. 36 (2016) 10–20.
[41] C. Liang, Y Gong, Enhancing Patient Safety Event Reporting by K-nearest Neighbor Classifier, Stud. Health Technol. Inform. 218 (2015) 40603. [42] A. Fong, A.Z. Hettinger, R.M. Ratwani, Exploring methods for identifying related patient safety events using structured and unstructured data, J. Biomed. Inform. 58 (2015) 89–95.
[43] T. Speroff, Exploring the frontier of electronic health record surveillance the case of postoperative complications, Med. Care. 51 (2013) 509–516. [44] J. Gaebel, T. Kolter, F. Arlt et al. Extraction Of Adverse Events From Clinical Documents To Support Decision Making Using Semantic Preprocessing, Stud. Health Technol. Inform. 216 (2015) 1030. [45] E. Iqbal, R. Mallah, R.G. Jackson et al. Identification of Adverse Drug Events from Free Text Electronic Patient Records and Information in a Large Mental Health Case Register, PLoS One. 10 (2015) e0134208. [46] A. Cohan, A. Fong, R.M. Ratwani et al. Identifying Harm Events in Clinical Care through Medical Narratives, in: Proc. 8th ACM Int. Conf. Bioinformatics, Comput.
ro of
Biol. Heal. Informatics - ACM-BCB ’17, ACM Press, New York, 2017: pp. 52–59. [47] A. Fong, N. Harriott, D.M. Walters et al. Integrating natural language processing expertise with patient safety event review committees to improve the analysis of medication events, Int. J. Med. Inform. 104 (2017) 120–125.
-p
[48] G.B. Weller, J. Lovely, D.W. Larson et al. Leveraging electronic health records for predictive modeling of post-surgical complications, Stat. Methods Med. Res. 0(0)
re
(2017) 1-15.
[49] K. Fujita, M. Akiyama, K. Park et al. Linguistic analysis of large-scale medical
lP
incident reports for patient safety, Stud. Health Technol. Inform. 180 (2012) 250–254. [50] P.A. Ravindranath, S. Bruschi, K. Ernstrom et al. Machine learning in automated classification of adverse events in clinical studies of Alzheimer’s disease, Alzheimer’s
na
Dement. 13 (2017) P1256.
[51] C. Liang, Y. Gong, Predicting Harm Scores from Patient Safety Event Reports., Stud. Health Technol. Inform. 245 (2017) 1075–1079.
ur
[52] W.M M, Screening Electronic Health Record-Related Patient Safety Reports Using Machine Learning, J. Patient Saf. 13 (2017) 31–36.
Jo
[53] S.D. McKnight, Semi-Supervised Classification of Patient Safety Event Reports, J. Patient Saf. 8 (2012) 60–64.
[54] Z.S.Y. Wong, Statistical classification of drug incidents due to look-alike sound-alike mix-ups, Health Informatics J. 22 (2016) 276–292. [55] Z.S.Y. Wong, M. Akiyama, Statistical text classifier to detect specific type of medical incidents, Stud. Health Technol. Inform. 192 (2013) 1053. [56] L.U. Gerdes, C. Hardahl, Text mining electronic health records to identify hospital adverse events, Stud. Health Technol. Inform. 192 (2013) 1145.
[57] T. Botsis, M.D. Nguyen, E.J. Woo et al. Text mining for the vaccine adverse event reporting system: Medical text classification using informative feature selection, J. Am. Med. Informatics Assoc. 18 (2011) 631–638. [58] A. Fong, J. Howe, K.T. Adams et al. Using Active Learning to Identify Health Information Technology Related Patient Safety Events, Appl. Clin. Inform. 8 (2017) 35–46. [59] Y. Wang, E. Coiera, W. Runciman et al. Using multiclass classification to automate the identification of patient safety incident reports by type and severity, BMC Med. Inform. Decis. Mak. 17 (2017) 84.
ro of
[60] T. Botsis, T. Buttolph, M. Nguyen et al. Vaccine adverse event text mining system for extracting features from vaccine safety reports, J. Am. Med. Informatics Assoc. 19 (2012) 1011–1018.
[61] J.F.E. Penz, A.B. Wilcox, J.F. Hurdle, Automated identification of adverse events related to central venous catheters, J. Biomed. Inform. 40 (2007) 174–182.
[63] MedLee, MedLee, (n.d.).
re
Remote Sens. Environ. 62 (1997) 77–89.
-p
[62] S. V. Stehman, Selecting and interpreting measures of thematic classification accuracy,
[64] SAS, SAS, (n.d.). https://www.sas.com/en_gb/software/text-miner.html.
Checklist. (2016).
lP
[65] E. Network, TRIPOD Checklist : Prediction Model Development and Validation,
[66] J. Chubak, G. Pocobelli, N.S. Weiss, Tradeoffs between accuracy measures for
na
electronic health care data algorithms, J. Clin. Epidemiol. 65 (2012) 343–349. [67] Y. Benjamini, Y. Hochberg, Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Controlling the False Discovery Rate: a Practical and
ur
Powerful Approach to Multiple Testing, Source J. R. Stat. Soc. Ser. B. 57 (1995) 289– 300.
Jo
[68] T. Murdoch, A. Detsky, The Inevitable Application of Big Data to Health Care, JAMA Evid. 309 (2013) 1351–1352.
[69] N.R. Adam, R. Wieder, D. Ghosh, Data science, learning, and applications to biomedical and health sciences, Ann. N. Y. Acad. Sci. 1387 (2017) 5–11. [70] B. Young, Getting the measure of diabetes : the evolution of the National Diabetes Audit, Practical Diabetes 35 (2018) 1–7.