Survival prediction algorithms miss significant opportunities for improvement if used for case selection in trauma quality improvement programs

Survival prediction algorithms miss significant opportunities for improvement if used for case selection in trauma quality improvement programs

Injury, Int. J. Care Injured 47 (2016) 1960–1965 Contents lists available at ScienceDirect Injury journal homepage: www.elsevier.com/locate/injury ...

718KB Sizes 0 Downloads 23 Views

Injury, Int. J. Care Injured 47 (2016) 1960–1965

Contents lists available at ScienceDirect

Injury journal homepage: www.elsevier.com/locate/injury

Survival prediction algorithms miss significant opportunities for improvement if used for case selection in trauma quality improvement programs Catherine Heima,* , Elaine Coleb , Anita Westc, Nigel Taid , Karim Brohie a

Department of Anaesthesiology CHUV, 1011 Lausanne, Switzerland Centre for Trauma Sciences, Blizard Institute, Queen Mary University of London, London, UK Royal London Hospital, Barts and the London NHS Trust, London, UK d Royal London Hospital, Barts and the London NHS Trust, London, UK e Centre for Trauma Sciences, Blizard Institute, Queen Mary University of London, London, UK b c

A R T I C L E I N F O

A B S T R A C T

Article history: Accepted 28 May 2016

Background: Quality improvement (QI) programs have shown to reduce preventable mortality in trauma care. Detailed review of all trauma deaths is a time and resource consuming process and calculated probability of survival (Ps) has been proposed as audit filter. Review is limited on deaths that were ‘expected to survive’. However no Ps-based algorithm has been validated and no study has examined elements of preventability associated with deaths classified as ‘expected’. The objective of this study was to examine whether trauma performance review can be streamlined using existing mortality prediction tools without missing important areas for improvement. Methods: We conducted a retrospective study of all trauma deaths reviewed by our trauma QI program. Deaths were classified into non-preventable, possibly preventable, probably preventable or preventable. Opportunities for improvement (OPIs) involve failure in the process of care and were classified into clinical and system deviations from standards of care. TRISS and PS were used for calculation of probability of survival. Peer-review charts were reviewed by a single investigator. Results: Over 8 years, 626 patients were included. One third showed elements of preventability and 4% were preventable. Preventability occurred across the entire range of the calculated Ps band. Limiting review to unexpected deaths would have missed over 50% of all preventability issues and a third of preventable deaths. 37% of patients showed opportunities for improvement (OPIs). Neither TRISS nor PS allowed for reliable identification of OPIs and limiting peer-review to patients with unexpected deaths would have missed close to 60% of all issues in care. Conclusions: TRISS and PS fail to identify a significant proportion of avoidable deaths and miss important opportunities for process and system improvement. Based on this, all trauma deaths should be subjected to expert panel review in order to aim at a maximal output of performance improvement programs. ã 2016 Elsevier Ltd. All rights reserved.

Keywords: Trauma Quality improvement Preventable death Audit filter Probability of survival TRISS Mortality audit

Introduction Trauma care is a complex process involving multiple teams of providers delivering time-critical interventions. This operational environment is prone to medical errors and deviations from accepted practices, the effects of which may be severe. Some

* Corresponding author. E-mail addresses: [email protected] (C. Heim), [email protected] (E. Cole), [email protected] (A. West), [email protected] (N. Tai), [email protected] (K. Brohi). http://dx.doi.org/10.1016/j.injury.2016.05.042 0020-1383/ã 2016 Elsevier Ltd. All rights reserved.

regions have reported preventable mortality rates as high as 1 in 3 of all injury related deaths [1–3]. Organised trauma systems have shown to reduce preventable mortality, in part by the inclusion of an integrated trauma quality improvement (QI) program [4–6]. Trauma QI is recognised as key to improving outcomes and is now integral to trauma system infrastructure around the world [5,7–11]. Mortality reviews aiming at identification of preventable deaths and variances in care are a central part of the QI process exploring opportunities for clinical and system improvement [2,12]. The analysis of preventable deaths in particular has been critical for the improvement of delivery of trauma care and remains in the variety of worldwide used performance indicators

C. Heim et al. / Injury, Int. J. Care Injured 47 (2016) 1960–1965

the only evidence-based item [7]. Detailed review of all deaths is a time-consuming process however, and can lead to provider apathy and disengagement. Algorithms that predict probability of survival have been proposed as tools to streamline the QI process and are frequently used as audit filter. Deaths are reviewed only if they occur in patients who were ‘expected to survive’ according to the prediction algorithm. ‘Expected deaths’ are not subjected to QI, as theoretically these deaths would not be ‘preventable’. The most common survival prediction tools in use are the Trauma Score—Injury Severity Score (TRISS) [13], used widely internationally, and the Probability of Survival score (PS), used primarily in the UK and parts of Europe [14,15]. Many trauma systems incorporate these tools into their QI processes and studies have described their findings on reviewing the ‘unexpected deaths’ identified by these tools. However neither TRISS nor PS has been evaluated on their ability to identify preventable elements in all trauma mortality, and specifically no study has examined elements of preventability associated with deaths classified as ‘expected’. The sensitivity and specificity of these tools for trauma QI is therefore unknown. The overall objective of this study was to examine whether trauma performance review can be streamlined using existing mortality prediction tools without missing important areas for improvement. Our first aim was to examine the association between the predicted probability of survival and the assigned ‘preventability’ in injury-related deaths reviewed by our trauma QI program. Our second aim was to examine any difference in the occurrence of preventable issues in deaths classified as ‘expected’ and ‘unexpected’ using these algorithms. Thirdly, we wished to determine the sensitivity & specificity of TRISS and PS for detecting both all preventability issues and specifically preventable deaths within a trauma quality improvement program. We conducted a retrospective study of all trauma deaths that were consecutively reviewed by our trauma QI program over an eight year period. Methods Study setting The Royal London Hospital (RLH) is one of four major trauma centres of the city of London with a trauma intake of approximately 2800 patients per year of which 25% are presenting with an Injury Severity Score (ISS) above 15. Weekly mortality audits are a substantial part of the systematic quality improvement program. A multidisciplinary specialist team, comprising surgeons, anaesthetists, pathologists and emergency medicine providers retrospectively review all trauma deaths occurring during the initial hospitalization in the acute care setting. The entire pathway of acute care is subjected to analysis. Autopsy data, obtained from the coroner’s office and commented by a pathology-consultant are integrated when available and where appropriate. In order to prevent prejudice, calculated probability of survival values (Ps), are not known to the peerreview committee and therefore not integrated into the process of auditing. Deaths were originally classified into non-preventable (NP; Outcome would have been the same regardless of actions; no identified areas for improvement), possibly preventable (POSS; Identified areas for performance improvement; more likely than not, outcome would have been the same regardless of errors made), probably preventable (PROB; Identified areas for performance improvement; more likely than not, death would not have occurred had identified errors been avoided) or preventable (PREV; Identified areas for improvement with high impact on outcome. Death would not have occurred had the identified errors been avoided) [16]. As ANY were summarized all patients in any of the preventability categories (POSS; PROB; PREV). Opportunities for

1961

performance improvement (OPIs) involving failure in the process of care were identified according to a predefined list. Clinical and system deviations from standards of care are discriminated. Clinical issues comprise delays to diagnosis and treatment, missed injuries, procedural errors and inappropriate imaging or treatment. System issues contain resource and transfer problems. From 2009 on, this categorization was changed from assignment of preventability to opportunities for performance improvement with identification of their impact on mortality: QI score 0 (no identified areas for improvement), QI score 1 (identified OPIs without any impact on outcome), QI score 2 (OPIs with low impact on outcome) and QI score 3 (OPIs with high impact on outcome). For the purpose of this study we transposed the QI scores to preventability categories in order to allow comparisons with performance improvement literature. We directly assigned QI scores to the previous preventability nomenclature (ie QI 0 = not preventable; QI 1 as POSS; QI 2 as PROB; QI 3 as PREV; and QI 1–3 as ANY). Subjects We conducted a retrospective analysis of all deaths occurring over an eight year period, from 1.1.2006 to 31.12.2013. All trauma deaths which had been subjected to mortality QI review and had the required data to calculate probability of survival according to TRISS- and PS-methodology were included. Patients dying shortly after ED-arrival as well as secondarily transferred-in patients were also included. Data collection Demographic data extracted from the hospital based trauma registry include age, gender, mechanism of injury, injury severity score (ISS) and discharge destination. Anatomic injury was classified according to the Abbreviated Injury scale by an AAAM-certified coder. From 2006–2011 the version AIS 1998 was in use and from February 2012 the AIS version 2005/update 2008. Calculation of probability of survival according to TRISS and PS was completed using the electronic calculators on www.trauma. org [17] and www.tarn.ac.uk [18]. Peer-review charts of the eight year period were reviewed by a single investigator for data extraction. Data analysis Statistical analysis was performed using JMP1 10.0.0, 2012 SAS Institute Inc. Percentages were analysed using chi squared or Fisher’s exact tests and medians compared using Wilcoxon/ Kruskal-Wallis test. Medians were expressed with interquartile range (IQR) of 25% and 75%. A p-value of <0.05 was considered statistically significant. Receiver operator characteristic (ROC) curves were used to analyse the ability of TRISS and PS to identify patients with any elements of preventability and those with clearly preventable deaths. Areas under the curves (AUC) were analyzed to compare TRISS and PS amongst each other. Results Over the study period 14,237 trauma calls were admitted to the Royal London Hospital Major Trauma Centre. 798 patients died (5.6%) and were included into the mortality review process in the trauma QI program. 172 were excluded from further analysis for this study, due to incomplete records (162) or awaiting review and consensus [7]. Missing data leading to exclusion were mainly parameters needed for TRISS and PS calculation as respiratory rates and age. 3 deaths were excluded as not trauma-related (Fig. 1). 626

1962

C. Heim et al. / Injury, Int. J. Care Injured 47 (2016) 1960–1965

Fig. 1. Flow chart of patient inclusion.

patients, 78.4% of all trauma deaths, were included in this study. The median age was 47 (IQR 27-70) with 74.3% being male. Median injury severity score (ISS) was 29 (IQR 25-38) and 86.9% of deaths were due to blunt trauma. The QI process determined that 190 patients had some elements of preventability in their process of care (ANY group,

Probability of survival (TRISS)

A

Probability of survival (PS)

B

100 80 60 40 20 0

NP

POSS

PROB

PREV

NP

POSS

PROB

PREV

100 80 60 40 20 0

Fig. 2. (A) Box and whisker graph shows median and interquartile range for TRISS in categories of preventability. NP: 37.4 (8.6–70), POSS: 56.3 (24.2–88.7), PROB: 84.9 (59.4–94.5), PREV: 88.6 (50.1–96.1). (B) Box and whisker graph shows median and interquartile range for PS in categories of preventability. NP: 52.2 (29.8–72.5), POSS: 65.7 (40.2–86.9), PROB: 81.9 (65.5–90.8), PREV: 88.7 (69.8–94.6).

30.4%) and in 25 patients death was deemed to have been preventable (PREV, 4.0%). Non-preventable deaths had a median predicted survival estimates of 37.4% for TRISS and 52.2% for PS. TRISS-values were 17% higher for patients with ANY elements of preventability than non-preventable death (45.3% ANY versus 27.8% NP, p < 0.0001); and PS was 26.1% higher (48.4% versus 22.3%, p < 0.0001) (Fig. 2A & B). Neither TRISS nor PS could differentiate between deaths rated as POSS, PROB or PREV. Preventability issues occurred across the entire range of calculated probability of survival band. PS appeared to display a more linear trend in the relationship between predicted survival and the occurrence of preventability issues, and no preventable death occurred in patients with a PS-value at 25% or below (Fig. 3A). When differentiating expected from unexpected deaths with a chosen threshold of 75% of calculated probability of survival, there were more deaths with elements of preventability (ANY) amongst the expected deaths than amongst the unexpected for both the TRISS and PS-score (54.7% and 51.6% respectively). Probably (PROB) and clearly preventable (PREV) deaths were more frequent in unexpected deaths (Fig. 3B). At a 50% threshold, preventability issues occurred more frequently in unexpected deaths for both TRISS and PS-score (61.1% and 75.8% respectively) (Fig. 3C). Limiting review to unexpected deaths using TRISS or PS as audit filter at a threshold of 75% would have lead to a review workload reduction of 70% (Table 1). However, such strategy would have missed 54.7% (TRISS) and 51.6% (PS) of all preventability issues and 36% (TRISS) or 28% (PS) of preventable deaths. Data for using a threshold of 50% predicted survival as audit filter is shown in Table 1. The overall sensitivity of TRISS in identifying any preventability was 45% at a threshold of 75%, increasing to 61% at a 50% threshold (Table 1). Receiver operating characteristic analysis showed a low capacity of TRISS to detect any preventability with an area under the curve (AUC) of 0.66 and an AUC of 0.74 for the detection of frankly preventable deaths (Fig. 4A). PS had the same performance as TRISS in detecting patients with any preventability (AUC of 0.66) and was somewhat better than TRISS at identifying clearly preventable deaths, with an AUC of 0.80. This, however, must be taken with caution due to the low numbers. We also examined the number and type of identified opportunities for improvement. For this purpose, a further 19 (3.0%) of the overall cohort had to be excluded due to lack of determination of OPI’s. Of the remaining 607 patients, 227 (37.4%) had identified issues during their clinical pathway in acute care. In 190 (83.7%) cases, the issues were of clinical order while as 75 cases (33%) showed system issues. OPIs occurred more often in patients presenting elements of preventability (72.3% versus 27.8%, p < 0.001). No difference in median Ps-values for patients with care

C. Heim et al. / Injury, Int. J. Care Injured 47 (2016) 1960–1965

1963

Fig. 3. (A) Preventability issues per calculated probability of survival with TRISS and PS. (B) Preventability issues occurring in expected and unexpected deaths as defined by Ps-threshold of 75% with TRISS and PS. (C) Preventability issues occurring in expected and unexpected deaths as defined by Ps-threshold of 50% with TRISS and PS.

issues were found (63.3% TRISS; 69.3% PS, p = 0.29), neither for clinical (63.9% TRISS, 70.5% PS, p = 0.60) nor system issues (67.7% TRISS, 70.7 PS, p = 0.67). At a 75% threshold, OPIs were identified more often in patients with unexpected death (49.5% versus 31.8%

for TRISS and 51.9% versus 30.7% for PS, p < 0.001). However, limiting peer-review to patients with unexpected deaths would have missed 59.5% (TRISS) and 57.7% (PS) of all issues in care. Even by lowering the review-threshold to 50%, up to 40% of OPIs would

Table 1 Effect of review thresholds. REVIEW THRESHOLD n

DEATHS THAT WOULD NOT BE REVIEWED n (%)

MISSED PREVENTABILITY n (%)

SENS %

SPEC %

NPV %

TRISS >75% PS >75%

188 189

438 (70) 437 (70)

104 (55) 98 (52)

45 48

77 78

76 78

TRISS >50% PS >50%

293 378

333 (53) 248 (40)

74 (3) 46 (24)

61 76

59 46

78 81

1964

C. Heim et al. / Injury, Int. J. Care Injured 47 (2016) 1960–1965

Fig. 4. (A) ROC curve shows the predictive capacity of TRISS for preventable deaths (AUC 0.74, CI: 0.64–0.83) and any preventability (AUC 0.66, CI: 0.62–0.71). (B) ROC curve shows the predictive capacity of PS for preventable deaths (AUC 0.80, CI: 0.72–0.88) and any preventability (AUC 0.66, CI: 0.62–0.72).

have been missed. Sensitivity for identification of OPIs was low for both scores (Table 2). Discussion In this study we have shown that the most widespread used audit filters in trauma QI programs fail to identify preventability and opportunities for improvement. Preventability issues occurred across the entire range of the calculated probability of survival band and more elements of preventability and variances in care were assigned to deaths with higher calculated probabilities of survival. At a Ps-threshold of 75%, expected deaths revealed more elements of preventability than amongst unexpected mortality and both TRISS and PS showed a low sensitivity in identifying preventability and OPIs at the 50% and 75% cut-off. Both, opportunities for improvement and preventable deaths occurred even in patients with a TRISS-value of below 25%. Only PS at a 25%threshold allowed for exclusion of frankly preventable deaths but using this as audit filter would have missed 17% of elements of preventability. Over the last 20 years, several authors have taken up the search for a reliable audit filter for identification of preventable deaths. Two smaller studies analysing the value of TRISS-threshold of 50% and 75% have found similar findings, showing that both, a cut-off of 50% and 75% failed to identify a significant proportion of avoidable deaths [19,20]. A study on preventability in road-traffic deaths found poor agreement between the judgement of a panel of clinicians and probability of survival as calculated by TRISS. Agreement was poorest for preventable deaths and several patients with TRISS 25% revealed elements of preventability [21]. Next to TRISS, various methods have been compared to panel review such as different ISS cut-offs [12], Trauma-score [12], ASCOT [22], International Classification of Disease-based Injury Severity Score (ICISS) derived calculated probability of survival [23], but none of them had proven clear superiority to expert panel review. The consequences of using TRISS and PS for QI case review require consideration, as both demonstrated a low sensitivity for the identification of patients with preventability and variances in

care. Our findings underline the need for continuous review of all trauma deaths independently of their calculated probability of survival value. Additionally, we have shown that a substantial amount of opportunities for improvement can occur even when death is inevitable and need to be tackled in a mature performance improvement program. Assessment of hospital performances and ranking systems based on preventable death rates may need to be reviewed unless all trauma deaths have been analysed without Psbased pre-selection. Limitations There are some limitations to this study. During the study time course of eight years, several structural changes occurred in the study center. Most notably was the move away from assigning ‘preventability’ to identifying and grading OPIs and their potential impact on outcome. While we mapped the OPI scores directly onto each prior preventability category, there is overlap between the groups. In particular QI score 1 issues could potentially have been assigned to the ‘NONE’ group as well as to the ‘PROB’ group. This cannot be known retrospectively out with of the contemporary peer review process. This may have reduced the ‘ANY’ preventability figures although it would only have done so for mortality. It would not have affected the ‘PREV’ category. For the purposes of this study examining the role of risk scores in selecting patients for peer review, we believe this to have been the most appropriate mapping. Reviewing a patient’s care pathway and assigning preventability is a subjective process. Despite the system of a multidisciplinary panel-review, inter-rater variability between panels does occur [24,25]. Although the composition of the peer-review panel of our study was constant, participating individuals changed over the years. However this does represent the real-world experience of any trauma centre engaging in a long-term performance improvement programme. Finally it is not known whether the identified OPIs in each category were modified (or modifiable) and the identification of OPIs does not necessarily reflect the ability to achieve performance improvement and thereby improve outcomes.

Table 2 Identification of opportunities for improvement. REVIEW THRESHOLD

CLINICAL ISSUES n (%)

SENS %

SPEC %

NPV %

SYSTEM ISSUES n (%)

SENS %

SPEC %

NPV %

TRISS >75% PS >75%

109 (57) 105 (55)

43 45

76 76

45 75

41 (55) 44 (59)

45 41

72 71

90 90

TRISS >50% PS >50%

70 (37) 57 (30)

63 70

60 44

78 76

30 (40) 17 (23)

60 77

55 42

81 93

C. Heim et al. / Injury, Int. J. Care Injured 47 (2016) 1960–1965

Conclusions Audit filters based on probability of survival calculations will result in a substantial reduction of workload for the peer-review committees. However, the mortality prediction tools TRISS and PS fail to identify a significant proportion of avoidable deaths and may miss important opportunities for process and system improvement. For maximal output of performance improvement programs, ideally risk stratification tools should not be used and all trauma deaths should be subjected to an expert panel review. Conflict of interest None. References [1] Kreis Jr. DJ, Plasencia G, Augenstein D, Davis JH, Echenique M, Vopal J, et al. Preventable trauma deaths: Dade County, Florida. J Trauma 1986;26(7):649– 54. [2] Anderson ID, Woodford M, de Dombal FT, Irving M. Retrospective study of 1000 deaths from injury in England and Wales. Br Med J 1988;296 (6632):1305–8. [3] Stocchetti N, Pagliarini G, Gennari M, Baldi G, Banchini E, Campari M, et al. Trauma care in Italy: evidence of in-hospital preventable deaths. J Trauma 1994;36(3):401–5. [4] Cales RH. Trauma mortality in Orange County: the effect of implementation of a regional trauma system. Ann Emerg Med 1984;13(1):1–10. [5] Trauma care systems quality improvement guidelines. American College of Emergency Physician. Ann Emerg Med 1992;21(6):736–9. [6] Lansink KW, Leenen LP. Do designated trauma systems improve outcome? Curr Opin Crit Care 2007;13(6):686–90. [7] Stelfox HT, Straus SE, Nathens A, Bobranska-Artiuch B. Evidence for quality indicators to evaluate adult trauma care: a systematic review. Crit Care Med 2011;39(4):846–59. [8] Mock C. WHO releases guidelines for trauma quality improvement programmes. Inj Prev 2009;15(5):359. [9] Nathens AB, Cryer HG, Fildes J. The American College of Surgeons Trauma Quality Improvement Program. Surg Clin North Am 2012;92(2)441–54 x–xi.

1965

[10] Committee on Quality of Health Care in America, Institute of Medicine. Crossing the quality chasm. A New Health System for the 21st Century. Washington, DC: National Academy Press; 2001. [11] Teixeira PG, Inaba K, Hadjizacharia P, Brown C, Salim A, Rhee P, et al. Preventable or potentially preventable mortality at a mature trauma center. J Trauma 2007;63(6)1338–46 discussion 46–7. [12] Deane SA, Gaudry PL, Woods P, Cass D, Hollands MJ, Cook RJ, et al. The management of injuries—a review of deaths in hospital. Aust. N Z J Surg 1988;58(6):463–9. [13] Boyd CR, Tolson MA, Copes WS. Evaluating trauma care: the TRISS method. Trauma Score and the Injury Severity Score. J Trauma 1987;27(4):370–8. [14] Bouamra O, Wrotchford A, Hollis S, Vail A, Woodford M, Lecky F. A new approach to outcome prediction in trauma: a comparison with the TRISS model. J Trauma 2006;61(3):701–10. [15] de Jongh MA, Verhofstad MH, Leenen LP. Accuracy of different survival prediction models in a trauma population. Br J Surg 2010;97(12):1805–13. [16] MacKenzie EJ. Review of evidence regarding trauma system effectiveness resulting from panel studies. J Trauma 1999;47(Suppl. (3)):34–41. [17] K. Brohi, TRISS: Trauma-Injury Severity Score 2007 [cited 2014 10 Dec]. Available from: http://www.trauma.org/js/trisscalc.html. [18] TARN, Ps12 Calculator [cited 2014 10 Dec]. Available from: https://www.tarn. ac.uk/training/Content.aspx?c=38. [19] Kelly AM, Nicholl J, Turner J. Determining the most effective level of TRISSderived probability of survival for use as an audit filter. Emerg Med 2002;14 (2):146–52. [20] Shanti CM, Tyburski JG, Rishell KB, Wilson RF, Lozen Y, Seibert C, et al. Correlation of revised trauma score and injury severity score (TRISS) predicted probability of survival with peer-reviewed determination of trauma deaths. Am Surg 2003;69(3)257–60 discussion 60. [21] McDermott FT, Cordner SM, Tremayne AB. Reproducibility of preventable death judgments and problem identification in 60 consecutive road trauma fatalities in Victoria, Australia. Consultative Committee on Road Traffic Fatalities in Victoria. J Trauma 1997;43(5):831–9. [22] Sugrue M, Seger M, Sloane D, Compton J, Hillman K, Deane S. Trauma outcomes: a death analysis study. Irish J. Med. Sci 1996;165(2):99–104. [23] Kim Y, Jung KY. Utility of the international classification of diseases injury severity score: detecting preventable deaths and comparing the performance of emergency medical centers. J Trauma 2003;54(4):775–80. [24] MacKenzie EJ, Steinwachs DM, Bone LR, Floccare DJ, Ramzy AI. Inter-rater reliability of preventable death judgments. The Preventable Death Study Group. J Trauma 1992;33(2)292–302 discussion 3. [25] Goldman RL. The reliability of peer assessments: a meta-analysis. Eval Health Professions 1994;17(1):3–21.