YJBIN 2437
No. of Pages 7, Model 5G
30 September 2015 Journal of Biomedical Informatics xxx (2015) xxx–xxx 1
Contents lists available at ScienceDirect
Journal of Biomedical Informatics journal homepage: www.elsevier.com/locate/yjbin 5 6
4
Exploring methods for identifying related patient safety events using structured and unstructured data
7
Allan Fong ⇑, A. Zachary Hettinger, Raj M. Ratwani
8
MedStar Institute for Innovation – National Center for Human Factors in Healthcare, 3007 Tilden St. NW, Suite 7M, Washington, DC 20008, USA
3
10 9 11 1 2 3 7 14 15 16 17 18 19 20 21 22 23 24 25 26
a r t i c l e
i n f o
Article history: Received 31 March 2015 Revised 2 July 2015 Accepted 10 September 2015 Available online xxxx Keywords: Patient safety event reports Topic modeling Unstructured data Structured data Search Relevance
a b s t r a c t Most healthcare systems have implemented patient safety event reporting systems to identify safety hazards. Searching the safety event data to find related patient safety reports and identify trends is challenging given the complexity and quantity of these reports. Structured data elements selected by the event reporter may be inaccurate and the free-text narrative descriptions are difficult to analyze. In this paper we present and explore methods for utilizing both the unstructured free-text and structured data elements in safety event reports to identify and rank similar events. We evaluate the results of three different free-text search methods, including a unique topic modeling adaptation, and structured element weights, using a patient fall use case. The various search techniques and weight combinations tended to prioritize different aspects of the event reports leading to different search and ranking results. These search and prioritization methods have the potential to greatly improve patient safety officers, and other healthcare workers, understanding of which safety event reports are related. Ó 2015 Published by Elsevier Inc.
28 29 30 31 32 33 34 35 36 37 38 39 40 41
42 43
1. Introduction
44
In an effort to improve safety most healthcare systems have a patient safety reporting system (PSRS) in place [1,2]. These systems provide a method for staff, including physicians, nurses, and technicians, to report on safety events in their environment ranging from near misses, where harm almost reaches a patient, to serious safety events, where a patient is harmed [3]. The Institute of Medicine has strongly recommended the use of these systems to identify why patients are harmed by medical errors, and several states require the use of a PSRS [3]. Patient safety event (PSE) reports generally contain structured information such as the time and site of occurrence, role of the participants (physician, nurse, technician, etc.), patient demographic and clinical attributes, as well as a classification of the severity and type of event. In addition to the structured data elements the safety event reports also include an unstructured free-text field in which the reporter can provide a narrative describing the patient safety event in greater detail [4]. Patient safety reporting systems can grow to contain tens of thousands to hundreds of thousands of events. Patient safety organizations, which act as safe harbors that allow providers to share PSE data without the liability risk,
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
⇑ Corresponding author. E-mail address:
[email protected] (A. Fong).
serve to combine PSE reports from multiple providers resulting in even larger safety event databases. If the data from patient safety reporting systems can be analyzed effectively the databases of reports hold tremendous promise for improving patient safety [5,6]. The data can be used to identify important patterns or trends of events that can then be remedied by intervening to remove or mitigate potential safety threats. However, to realize the promise of patient safety event reporting systems, efficient and effective analysis methods need to be developed to allow for a deeper understanding of the data that can then lead to action to improve safety. The challenge is that patient safety data are incredibly complex with both structured and unstructured data elements. While analyzing the structured data may be relatively straightforward these data only provide a partial understanding of the safety event and many events actually span multiple pre-defined categories, such as with general event type (e.g. medication, falls, miscellaneous, diagnosis/imaging) and specific event type assignments. Given that many event reporting systems only allow the selection of one general event and specific event type category the structured data may not accurately reflect the context of the safety event. Furthermore, most safety reporting systems do not provide a formal definition of the event type category. As a result, the reporter must select a category based on their own knowledge and intuition, hence the ambiguity that can sometimes arise from these categories. For a more complete understanding of the event,
http://dx.doi.org/10.1016/j.jbi.2015.09.011 1532-0464/Ó 2015 Published by Elsevier Inc.
Please cite this article in press as: A. Fong et al., Exploring methods for identifying related patient safety events using structured and unstructured data, J Biomed Inform (2015), http://dx.doi.org/10.1016/j.jbi.2015.09.011
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
YJBIN 2437
No. of Pages 7, Model 5G
30 September 2015 2 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146
A. Fong et al. / Journal of Biomedical Informatics xxx (2015) xxx–xxx
the unstructured free-text narratives, which often contain a rich description of the patient safety event from the perspective of the reporter, must be analyzed in combination with the structured data more efficiently and effectively. A very important and practical use of the PSE data is to understand whether a recently reported event is part of a larger trend that should be of immediate concern and warrant the allocation of limited resources to address the safety hazard or whether the recent event is an anomaly and can be prioritized accordingly [5]. In this paper we focus on developing PSE report analysis techniques to address this very pressing problem. To understand how a new event aligns with previous events one must to be able to identify other PSE reports in the event report database that are similar. Although most patient safety reporting systems provide the ability to query by structured fields and match keywords in the unstructured narratives, this process remains labor intensive and challenging because of the large number of irrelevant reports that are returned from these searches. In this paper, we present and explore methods for searching large databases of PSE data to identify and rank similar event reports using both unstructured and structured data. This approach consists of using natural language process (NLP) techniques to search free-text and a structured element weighting scheme to prioritize the search results. NLP leverages the power of computers to process and make sense of large amounts of text. There are several NLP methods and strategies developed for search and retrieval tasks. For example, identifying important words in reports and document distant metrics can be used to help find, match, and rank documents by their similarities [7–9]. In addition, methods, such as topic modeling, have been widely used to identify latent themes or topics in documents [10]. Reports that discuss similar topics would have similar topic probability profiles. Furthermore, previous work has used NLP approaches to categorize and identify health information safety events and extreme risk events in free-text reports [11,12]. NLP techniques have also been used to identify safety events from clinical documentations [13,14]. In addition to analyzing free-text, techniques such as structural topic modeling and labeled Latent Dirichlet Allocation (LDA) have been developed to take into consideration structured data, or meta data, into topic models [15–17]. However, these studies did not focus on, and evaluate, the utility of considering various structured data elements on search methods. The work presented in this paper builds on these previous research efforts by utilizing the additional structured data elements in the PSE reports to search for and rank related events. While an analysis of the unstructured data alone has been a useful approach, leveraging the structured data elements in combination with a NLP approach may allow for the improved identification of similar PSE reports. However, combining the structured and unstructured data is not always intuitive and it can be difficult to interpret how the inclusion of different structured elements impacts search results. These challenges are further exacerbated by noisy structured data, particularly prevalent in self-reported PSE data. In our approach, we propose a more intuitive and less complex approach to incorporate and evaluate the effects of considering structured elements with the unstructured free-text data to enhance the search for similar PSE reports.
147
2. Methods
148
Our approach consists of leveraging both the free-text and structured elements in PSE reports to identify and rank related PSE reports. We first discuss three different NLP search methods (topic models, unigrams, and bigrams) and their application to the unstructured free-text in each report. Each method takes a
149 150 151 152
single PSE report, the base report, and ranks other PSE reports according to their similarity to the base report. We present a unique approach that leverages topic modeling results to find related PSE reports and compare these results with more standard unigram and bigram search techniques. We then discuss how the structured element weights are incorporated with the ranking results.
153
2.1. Data
160
PSE reports were collected, through self-report, over a two year period (January 2013 to January 2015) from a multi-hospital healthcare system in the mid-Atlantic region of the United States. A total of 49,859 reports were collected during this time from the PSRS. Each report has both structured and unstructured fields. For this analysis, we focused on the general event type (GET) and specific event type (SET) structured elements (as they provide the greatest insight about the reports compared to other structured elements) and the unstructured free-text brief factual description of the event. There are 21 GETs, the most common being ‘‘medication/fluid,” ‘‘labs,” ‘‘falls” and ‘‘miscellaneous,” Table 1. Furthermore, each GET category is comprised of several unique SET categories. For example, the ‘‘falls” GET category has ‘‘from bed,” ‘‘while ambulating,” and ‘‘from chair” SET categories while the ‘‘medication/fluid” GET has ‘‘adverse drug reaction,” ‘‘duplicate therapy,” and ‘‘wrong patient” SET categories. Every PSE report is assigned a single GET and a SET category by the reporter of the event.
161
2.2. Free-text search methods
179
We present three free-text search methods below. Punctuations, numbers, common stop words were removed from the free-text and words were stemmed prior to analysis. After this preprocessing, the medium number of terms (words) for a report was 27 with a standard deviation of 33.
180
2.2.1. Topic model Our first method adapted topic modeling techniques to evaluate the similarity or relevance between reports. We first used the term frequency – inverse document frequency (tf-idf) statistic to identify important words in each report [8]. We then used this subset of words as inputs to our LDA topic model [10]. Reports were evaluated based on their topic probability distribution distance from a base report. While LDA is a popular and commonly used topic modeling technique, it is limited, particularly if the underlying topics are not well-separated [18]. This is more likely to occur when the informational content in documents is noisy. Unfortunately, PSE reports greatly vary in complexity and length; some reports are brief sentences while others are long detailed narratives. To address this limitation, we used tf-idf to first identify the
185
Table 1 Top 10 GET by percent of total reports. General event type categories
Percent
Medication/fluid Lab/specimen Fall Miscellaneous Blood bank Skin/tissue Diagnosis/treatment Patient ID/documentation/consent Surgery/procedure Lines/tubes/drain
17 15 12 10 7 5 5 5 4 4
Please cite this article in press as: A. Fong et al., Exploring methods for identifying related patient safety events using structured and unstructured data, J Biomed Inform (2015), http://dx.doi.org/10.1016/j.jbi.2015.09.011
154 155 156 157 158 159
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178
181 182 183 184
186 187 188 189 190 191 192 193 194 195 196 197 198 199
YJBIN 2437
No. of Pages 7, Model 5G
30 September 2015 A. Fong et al. / Journal of Biomedical Informatics xxx (2015) xxx–xxx 200 201 202 203
204
important terms for each report, thereby increasing the likelihood that each report would be associated with a smaller subset of more relevant topics. We calculate the tf-idf value of a term in a given report as follows:
tfidf report;term ¼
206 207 208 209 210 211 212 213 214 215 216 217
218 220
221 222 223 224
225 227 228 229 230 231 232 233 234 235 236 237 238
239 241
242 243 244 245 246 247 248 249 250 251 252 253
254 256 257 258 259 260 261
count of term in report total number of terms in report total number of reports log total number of reports with term
To avoid over-fitting while still capturing enough terms, we limited reports with more than twenty terms to the top twenty terms as determined by their tf-idf values. These terms became inputs to our LDA topic model. We evaluated the cohesion, exclusivity, and topic terms of LDA results with various topic numbers ranging from 20 to 50. We found that 40 topics generated cleaner topics, balancing the cohesion and exclusivity of the topics [19,20]. After topic modeling, each report had a topic probability distribution. The distance, d, between the base report (the report used for querying), b, and every other report, r, was calculated as follows:
db ðrÞ ¼
X 2 ðrt bt Þ t2T
3
was helpful to provide an additional measure of utility for each bigram. We used PMI and tf-idf scores to select the search bigrams for the base report. The matched reports were then scored and ranked by relevance, relevancebigram, similar to the unigram method.
262
2.3. Structured elements
267
Lastly, we evaluated the utility of incorporating structured data elements in the overall ranking results. We extracted the GET and SET elements for each event report as these structured data elements convey the greatest insight about the report as compared to other structured data elements (location, person reporting, etc.). If a search result had the same GET or SET as the base report, the relevance score for the search result would be increased by a weighting factor. This resulted in an adjusted relevance score for the search result:
268
relevancetm ðrÞ ¼ 1=d such that reports with lower d were considered closer, or more relevant to the base report. Reports with higher relevance were ranked closer to one. 2.2.2. Unigram TF-IDF The second method used unigrams, or individual words such as ‘‘bed,” ‘‘floor,” and ‘‘fall,” to determine the similarly of reports. Given a base report, we identified the top 20 tf-idf unigrams, similar to the technique described above, and used these terms to search through all the PSE reports. We then calculated the tf-idf values for each search term in every report to determine the relevance score for each report, r:
X relevanceuni ðrÞ ¼ tfidf t t2T
where the tf-idf values for each search term, t, in r, are summed for all search terms, T. Reports with greater tf-idf sums were considered more relevant to the base report and ranked higher. 2.2.3. Bigram TF-IDF In addition to unigram terms, we also used bigrams, or word pairs such as ‘‘side rail” and ‘‘patient fall,” as search terms. Bigrams might provide more insight than unigrams as there are many word pairs in the healthcare vernacular that are particularly meaningful. For example, ‘‘special instruction” is a common term associated with physician medication orders. Similar to unigrams, we calculated bigrams tf-idf scores. In addition, each bigram was assigned a pointwise mutual information (PMI) value,
pðx; yÞ pmiðx; yÞ ¼ log pðxÞpðyÞ such that the PMI for a x–y bigram is the log probability of the bigram occurring in a report divided by the probability of the terms occurring independently. More interesting bigrams generally have higher PMI values. For example, ‘‘radiant warmer” and ‘‘bair/bear hugger” each have larger PMI values than ‘‘blood bank.” This metric
264 265 266
269 270 271 272 273 274 275 276
277
relevancetm;adj ¼ relevancetm þ weightGET f0; 1g þ weightSET f0; 1g
279 280
relevanceunigram;adj ¼ relevanceunigram þ weightGET f0; 1g þ weightSET f0; 1g
such that the distance, d, between a report, r, and the base report, b, is the sum of the squared distances between each topic probabilities, t, for all topics, T. From the distance score, we calculated a relevance score for each report:
263
282 283
relevancebigram;adj ¼ relevancebigram þ weightGET f0; 1g þ weightSET f0; 1g
285
such that the GET and SET weights would only be applied if the elements matched that of the base report. The reports were then reordered by the adjusted relevance score. We evaluated the sensitivity of the report rankings using various GET and SET weight combinations. Combinations of weights that rank relevant reports higher were considered more favorable.
286
2.4. Case study: initial model evaluations
292
To evaluate the results of these methods, we searched the reports using a base report pertaining to a patient fall event that occurred from the bed. In this event, a nurse was giving a patient a bed bath when the nurse was interrupted by a resident. While the nurse was talking to the resident, the patient rolled to the side of the bed without rails up and rolled off the bed. We first use the free-text search methods to retrieve the top 150 reports (fifty from each search method) from the PSRS, a common method for evaluating search results in information retrieval [21]. We then had three human factors healthcare domain experts, who are accustomed to studying and reviewing PSE reports, examining each of the 150 reports for relevancy to the base report. This method follows previous methods evaluating agreement between three and more reviewers [22]. Reviewers provided a binary indicator as to whether each of the resulting search reports was similar to the base case (e.g. yes/no). Reviewers were instructed to consider the role of a patient safety officer who is reviewing a recent report (the base report) and is attempting to find similar reports to determine whether there is a pattern of events. Determining relevance of search results is challenging and subjective; however, domain experts have an understanding of which reports are relevant and our method of providing context to the reviewers helps frame the problem more effectively. Although, we cannot directly evaluate the precision and recall of the models without a comprehensive review of all the PSE reports, the annotated results from this specific search exercise allow us to evaluate the relative precision of these models as well as the impact of structured elements on rankings.
293
Please cite this article in press as: A. Fong et al., Exploring methods for identifying related patient safety events using structured and unstructured data, J Biomed Inform (2015), http://dx.doi.org/10.1016/j.jbi.2015.09.011
287 288 289 290 291
294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320
YJBIN 2437
No. of Pages 7, Model 5G
30 September 2015 4
A. Fong et al. / Journal of Biomedical Informatics xxx (2015) xxx–xxx
321
3. Results
322
326
We present the results from this use case evaluation involving a patient fall PSE report. We first present the top search criteria for each of the three search methods. We then compare the different search results and the sensitivity of these results when incorporating structured element weights.
327
3.1. Text search criteria
323 324 325
328 329 330 331 332 333
The three free-text search methods resulted in different search criteria used to query the PSE reports. 3.1.1. Topic modeling search criteria The topic probability distribution of the base report was mostly captured by three topics as shown in Table 2. This probability profile was used to query the PSE reports.
337
3.1.2. Unigram and bigram search criteria The top unigram and bigram search terms are identified and summarized in Table 3. These terms highlight the more important unigram and bigram phrases from the base report.
338
3.2. Search results
339
348
The only overlapping result from all three free-text search methods was the base report. The base report was included in the search space to provide a baseline for the relevance scores. Besides the base report, the topic model results did not have any overlapping reports with the other methods. On the other hand, there were 18 common reports between the unigram and bigram results. Approximately half of the overlapping reports were scored higher in the unigram model, as shown in Fig. 1. In addition, sample phrases for some of the corresponding reports are included in the figure to provide context.
349
3.3. Relevance
350
Each reviewer coded all 150 search results. There was complete relevance agreement between all three reviewers for 79% of the results. In situations where there was not complete agreement, the report was classified as being relevant or not based on majority vote. Some noticeable features about reports reviewers disagreed on were that it was sometimes unclear whether the fall event could have been prevented from the side rail or in other ways. The topic model search method returned the most relevant search
334 335 336
340 341 342 343 344 345 346 347
351 352 353 354 355 356 357
results (29/50), while the unigram and bigram methods both returned slightly less than half that amount (13 and 11 respectively). Fig. 2 shows the relevance of search results for each model. Each of the colors represents whether the GET and SET associated with the search results match the GET and SET of the base case. Search results from the model can be placed into one of three categories: (black) does not have the same GET nor the same SET as the base case, (light gray) has the same GET but does not have the same SET as the base case, or (dark gray) has the same GET and the same SET as the base case. For example, a result categorized as a ‘‘fall” and ‘‘from chair” will be light gray since it matches the GET of the base case but does not match the SET of the base case. This figure highlights the ambiguity that can sometimes occur in SET categorization, for example, in addition to ‘‘from bed,” there is a ‘‘from bed-over rails” SET category. This figure also highlights the false positive search results (averaging 32% across the models) if only considering the structured elements. Relevant reports from each search method had the same ‘‘fall” GET as the base report. However, not all ‘‘fall” GET search results were relevant. Relevant reports that shared the same SET were more common in the topic model results as compared to the other
Fig. 1. Ranking comparison of overlapping results between the unigram and bigram search methods (for example, the dotted line represents a single PSE report that was ranked as 4 by the unigram method and 31 by the bigram method).
Table 2 Topic words and probabilities associated with the most relevant LDA topic assignments. Topic words
Topic probability
Floor, side, head, get, bed, move, sit Fall, bathroom, precaut, assist, walk, alarm, got Wound, area, pressure, skin, clean, stage
0.2 0.1 0.1
Table 3 Top unigram and bigram tf-idf terms. Unigram (tf-idf)
Bigram (tf-idf)
Side (0.18), rail (0.16), bath (0.14), roll (0.11), bed (0.08), interrupt (0.07), resid (0.07), curtain (0.07), linen (0.07)
Bed bath (0.11), roll from (0.11), curtain patient (0.11), fall injury (0.11), patient roll (0.09), with rail (0.09), roll out (0.08)
Fig. 2. Number of non-relevant (nr) and relevant (r) results for each free-text search method. Shading indicates matches of GET and SET to the base report.
Please cite this article in press as: A. Fong et al., Exploring methods for identifying related patient safety events using structured and unstructured data, J Biomed Inform (2015), http://dx.doi.org/10.1016/j.jbi.2015.09.011
358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378
YJBIN 2437
No. of Pages 7, Model 5G
30 September 2015 A. Fong et al. / Journal of Biomedical Informatics xxx (2015) xxx–xxx 379 380 381 382
two methods. The majority of the non-relevant reports were from different GET categories, although the topic model and bigram methods tended to find more non-relevant reports from the same GET category.
383
3.4. Structured elements effect on rank sensitivity
384
400
Lastly we evaluated the ranking sensitivities to different GET and SET weights. We explored different weighting combinations as shown in Fig. 3. We explored different GET weights, wGET, [0, 0.2, 0.4, 0.6, and 1] and different SET weights, wSET, [0, 0.2, 0.4, 0.6, and 1], combinations. A zero weight means the structured element will have no effect of the relevance score. Overall, the unigram and bigram search results were the most sensitive to the structured element weights. This can be seen by the greater slope changes with different weight combinations. In addition, slope differences reflect the relative precision of the model. The closer the slope is to one, the more likely relevant reports are prioritized and ranked higher by that particular method and weight combination. The models were generally more sensitive to GET weights compared to SET weights. Overall, the topic model method and higher GET and SET weights tended to produce more precise models for retrieving similar events given a base report.
401
4. Discussion
402
The combination of free-text search and structured element weights may be a useful mechanism for patient safety officers and other healthcare staff to identify related PSE events. These search and ranking methods may be particularly useful for identifying trends in near miss events where the classification is not as rigorous as with serious harm events. In this section, we discuss insights and implications of this work as well as some efforts to address challenges and limitations.
385 386 387 388 389 390 391 392 393 394 395 396 397 398 399
403 404 405 406 407 408 409
5
4.1. Text search performance
410
Overall the topic model search method found the most relevant PSE reports. This could be a reflection of the modeling and sampling techniques involved with LDA. The unigram and bigram models had poorer results which might be an indication of how the terms in the free-text descriptions were evaluated. Because the unigram and bigram models were only matched on key search terms, they tended to favor shorter reports that contained the search terms. For example, ‘‘patient bed side rails up when no orders were given” reports were common search results. These reports were not relevant to the base report but had high ranking scores. However, these search results tended not to have the same GET or SET as the base report, which highlights the importance of complimenting the free-text data with structured elements. One interesting difference between the unigram and bigram methods was the selection of terms used in each model. The unigram model identified ‘‘interrupt” from the base report while the bigram model did not. This most likely is because terms like ‘‘interrupt” are less likely to be paired with other words, even though, interruption could be an important contributing factor to this event.
411
4.2. Ranking sensitivity
430
Unigram and bigram model results were generally more sensitive to GET and SET weights compared to the topic model results. This may be reflective of the alignment between the LDA topic results and the GET and SET structured categories. All relevant reports from the search results were from the ‘‘fall” GET, the same as the base report. One might conclude that these free-text search techniques are not necessary and relying on the GET alone would suffice. However, an expanded review of the top 150 cases from the unigram search model revealed several related reports categorized as ‘‘skin/tissue” GET. In these reports, patients had skin or tissue wounds after falling from the side of their bed without side rails up. In addition, several non-relevant search results also had
431
Fig. 3. Columns indicate different GET weights, wGET, and line color represents different SET weights, wSET, for each search method (rows). Each subgraph plots the cumulative number of relevant reports (y-axis) by the number of reports in ranked order (x-axis). Note different y-axis scales used to highlight weight effects between the search methods. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Please cite this article in press as: A. Fong et al., Exploring methods for identifying related patient safety events using structured and unstructured data, J Biomed Inform (2015), http://dx.doi.org/10.1016/j.jbi.2015.09.011
412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429
432 433 434 435 436 437 438 439 440 441 442
YJBIN 2437
No. of Pages 7, Model 5G
30 September 2015 6
A. Fong et al. / Journal of Biomedical Informatics xxx (2015) xxx–xxx
447
the ‘‘fall” GET, particularly in the topic model results. Using just the structured elements alone to search, or using only the free-text to search, will likely result in many false positives or misses, neither of which are undesirable and will be time consuming for users to process through and correct.
448
4.3. Search space
449
459
Interestingly, examining the top fifty reports from each search method revealed few overlapping reports. This suggests that each method tended to examine or prioritize the search space differently. Besides the base report, there was no overlap between the unigram and bigram models with the topic model. This highlights a principled difference between these search strategies. Although the topic model approach performed best in our use case, it missed relevant reports identified by the other approaches. As a result, it may be valuable to integrate the results from the three different search methods, for example, returning the top results from each approach.
460
4.4. Utility of incorporating structured elements into search
5. Conclusion
518
461
Patient safety event (PSE) reports can provide a wealth of information and insight for healthcare professionals to understand unsafe conditions and prevalent safety hazards. However, given a PSE event, it is often challenging and time consuming for patient safety officers to find related PSE reports. This is impart because of the quantity and the complexities involved in understanding both the unstructured free-text and structured data in these reports. We present different search and ranking methods that utilize both the free-text and structured data elements. We explore each of these methods with a patient fall use case and highlight advantages and insights from these approaches.
519
472
Our research suggests the importance of incorporating both structured and unstructured elements into search as both data elements have advantages and limitations. These elements would be able to complement each other, producing a more useful matching approach. However, integrating structured elements with free-text searches must be done carefully as we have demonstrated the sensitivity of rankings with various weight combinations. Perhaps an interactive option that allows users to explore various weighting combinations for structured elements will be useful. In addition to GET and SET, there are also other structured elements that would be interesting to explore, such as hospital location, department type, time of day, and day of week.
Conflict of interest
530
473
4.5. Insight and visualization
474
485
A primary objective of this work is to provide healthcare providers and patient safety officers with a more robust and efficient method to search and identify similar PSE reports. In addition to developing the search methods, an information visualization dashboard would allow users to more easily interact with the event reports. This visualization should let users select and order key search terms, prioritize search methods, and weight matches on structured elements. We have started developing prototype visualizations and search dashboards aimed at intuitively conveying the results of these search methods. While PSE reports are still too complex for automatic relevance matching, algorithms can be developed to better support users in understanding safety event report data.
486
4.6. Challenges and limitations
487
4.6.1. Determining relevance A challenge we faced when evaluating the search results was that it was difficult to establish ground truth for related reports. Defining relevance often depends on the mental model of the person reviewing reports. Although we had an inter-rater reliability of 79%, determining the relevance of search results is an area of ongoing work and is critical for developing effective methods that facilitate the analysis of PSE data.
443 444 445 446
450 451 452 453 454 455 456 457 458
462 463 464 465 466 467 468 469 470 471
readily available with the data. It would be important to further understand the performance of these models with additional search cases that have various unsafe conditions. Furthermore, expanding the review of search results can be particularly helpful to identify more nuanced relevant reports, such as a skin/tissue wound caused by a fall. Some of our current efforts involve developing lists of related PSE reports annotated and validated by subject matter experts. This will greatly improve the development and assessment of search and retrieval methods.
500
4.6.3. Search criteria The search methods largely depended on the number of important terms and, for the topic model method, the number of topics, considered. Although we used the number of terms as a threshold cutoff, it might be useful to also explore the usage of tf-idf values or other criteria as thresholds. In addition, we did not evaluate the performance of LDA for topics greater than 50. This is an important consideration for future work, especially as the number of PSE reports increase.
509
The authors declare that there are no conflicts of interest. 475 476 477 478 479 480 481 482 483 484
488 489 490 491 492 493 494 495 496 497 498 499
4.6.2. Additional evaluations While the results and evaluation were insightful, they were limited by the number of search cases and search results examined. Only one search case and the top 50 search results for each model were evaluated. This was largely a result of limited annotations
501 502 503 504 505 506 507 508
510 511 512 513 514 515 516 517
520 521 522 523 524 525 526 527 528 529
531
Acknowledgment
532
We would like to acknowledge and thank the human factors and medical experts in our organization for their help and insights.
533
References
535
[1] P. Aspden, J.W. Corrigan, S.M. Erickson, Patient safety reporting systems and applications, Patient Safety: Achieving a New Standard of Care, National Academy Press, Washington, D.C., 2004, pp. 250–278. [2] J. Rosenthal, M. Booth, Maximizing the Use of State Adverse Event Data to Improve Patient Safety, Portlan, ME, 2005. [3] J.R. Clarke, How a system for reporting medical errors can and cannot improve patient safety, Am. Surg. 72 (2006) 1088–1091 (discussion 1126–1148,
). [4] A.A. Boxwala, M. Dierks, M. Keenan, et al., Organization and representation of patient safety data: current status and issues around generalizability and scalability, JAMIA 11 (2004) 468–478, http://dx.doi.org/10.1197/jamia.M1317. Errors. [5] P.J. Pronovost, D.A. Thompson, C.G. Holzmueller, et al., Toward learning from patient safety reporting systems, J. Clin. Nurs. 21 (2006) 305–315. . [6] A.W. Wu, P.J. Pronovost, L. Morlock, ICU incident reporting systems, J. Crit. Care 17 (2002) 86–94. . [7] J. Balin´ski, C. Daniłowicz, Re-ranking method based on inter-document distances, Inf. Process. Manage. 41 (2005) 759–775. [8] C.B. Gerard Salton, Term-weighting approaches in automatic text retrieval, Inf. Process. Manage. 24 (1988) 513–523. [9] J. Ramos, Using TF-IDF to determine word relevance in document queries, in: Proceedings of the First Instructional Conference on Machine Learning, 2003, doi: 10.1.1.121.1424. [10] D. Blei, A. Ng, M. Jordan, Latent Dirichlet allocation, J. Mach. Learn. Res. 3 (2003) 993–1022. (accessed 11 Sep 2014).
536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563
Please cite this article in press as: A. Fong et al., Exploring methods for identifying related patient safety events using structured and unstructured data, J Biomed Inform (2015), http://dx.doi.org/10.1016/j.jbi.2015.09.011
534
YJBIN 2437
No. of Pages 7, Model 5G
30 September 2015 A. Fong et al. / Journal of Biomedical Informatics xxx (2015) xxx–xxx 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582
[11] F. Magrabi, M.-S. Ong, W. Runciman, et al., Using FDA reports to inform a classification for health information technology safety problems, J. Am. Med. Inf. Assoc. 19 (2012) 45–53, http://dx.doi.org/10.1136/amiajnl-2011-000369. [12] M.-S. Ong, F. Magrabi, E. Coiera, Automated identification of extreme-risk events in clinical incident reports, J. Am. Med. Inf. Assoc. 19 (2012) e110–e118, http://dx.doi.org/10.1136/amiajnl-2011-000562. [13] G. Melton, G. Hripcsak, Automated detection of adverse events using natural language processing of discharge summaries, J. Am. Med. Inf. Assoc. 12 (2005) 448–457. [14] T. Botsis, M.D. Nguyen, E.J. Woo, et al., Text mining for the vaccine adverse event reporting system: medical text classification using informative feature selection, J. Am. Med. Inf. Assoc. 18, 631–638, http://dx.doi.org/10.1136/ amiajnl-2010-000022. [15] M. Roberts, B. Stewart, D. Tingley, et al., The structural topic model and applied social science, 2013, (accessed 11 Sep 2014). [16] M.E. Roberts, B.M. Stewart, D. Tingley, et al., Structural topic models for openended survey responses, Am. J. Pol. Sci. 00 (2014) n/a–n/a, http://dx.doi.org/ 10.1111/ajps.12103.
7
[17] D. Ramage, D. Hall, R. Nallapati, et al., Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora, in: Conference on Empirical Methods in Natural Language Processing, 2009, pp. 248–256, doi: http:// dx.doi.org/10.3115/1699510.1699543. [18] Jian Tang, Zhaoshi Meng, Xuanlong Nguyen, Qiaozhu Mei, M. Zhang, Understanding the limiting factors of topic modeling via posterior contraction analysis, in: International Conference on Machine Learning, 2014, pp. 190–198. [19] J. Bischof, E. Airoldi, Summarizing topical content with word frequency and exclusivity, in: Proc 29th Int Conf Mach Learn, 2012, pp. 201–208, . [20] D. Mimno, H.M. Wallach, E. Talley, et al., Optimizing semantic coherence in topic models, 2011, pp. 262–272. [21] C.D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008. [22] S. Fahl, M. Harbach, Y. Acar, et al., On the ecological validity of a password study, in: Proc Ninth Symp Usable Priv Secur – SOUPS ’13, 2013, p. 13, http:// dx.doi.org/10.1145/2501604.2501617.
583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602
Please cite this article in press as: A. Fong et al., Exploring methods for identifying related patient safety events using structured and unstructured data, J Biomed Inform (2015), http://dx.doi.org/10.1016/j.jbi.2015.09.011