Observational studies using propensity score analysis underestimated the effect sizes in critical care medicine

Observational studies using propensity score analysis underestimated the effect sizes in critical care medicine

Journal of Clinical Epidemiology - (2014) - ORIGINAL ARTICLE Observational studies using propensity score analysis underestimated the effect size...

1MB Sizes 1 Downloads 26 Views

Journal of Clinical Epidemiology

-

(2014)

-

ORIGINAL ARTICLE

Observational studies using propensity score analysis underestimated the effect sizes in critical care medicine Zhongheng Zhang*, Hongying Ni, Xiao Xu Department of Critical Care Medicine, Jinhua Municipal Central Hospital, Jinhua Hospital of Zhejiang University, No. 351, Mingyue Street, Jinhua, Zhejiang, 321000, P.R. China Accepted 21 February 2014; Published online xxxx

Abstract Background and Objective: Propensity score (PS) analysis has been increasingly used in critical care medicine; however, its validation has not been systematically investigated. The present study aimed to compare effect sizes in PS-based observational studies vs. randomized controlled trials (RCTs) (or meta-analysis of RCTs). Methods: Critical care observational studies using PS were systematically searched in PubMed from inception to April 2013. Identified PS-based studies were matched to one or more RCTs in terms of population, intervention, comparison, and outcome. The effect sizes of experimental treatments were compared for PS-based studies vs. RCTs (or meta-analysis of RCTs) with sign test. Furthermore, ratio of odds ratio (ROR) was calculated from the interaction term of treatment  study type in a logistic regression model. A ROR ! 1 indicates greater benefit for experimental treatment in RCTs compared with PS-based studies. RORs of each comparison were pooled by using meta-analytic approach with random-effects model. Results: A total of 20 PS-based studies were identified and matched to RCTs. Twelve of the 20 comparisons showed greater beneficial effect for experimental treatment in RCTs than that in PS-based studies (sign test P 5 0.503). The difference was statistically significant in four comparisons. ROR can be calculated from 13 comparisons, of which four showed significantly greater beneficial effect for experimental treatment in RCTs. The pooled ROR was 0.71 (95% CI: 0.63, 0.79; P 5 0.002), suggesting that RCTs (or meta-analysis of RCTs) were more likely to report beneficial effect for the experimental treatment than PS-based studies. The result remained unchanged in sensitivity analysis and meta-regression. Conclusion: In critical care literature, PS-based observational study is likely to report less beneficial effect of experimental treatment compared with RCTs (or meta-analysis of RCTs). Ó 2014 Elsevier Inc. All rights reserved. Keywords: Propensity score; Randomized controlled trial; Critical care; Effect size; Ratio of odds ratio; Observational study

1. Introduction Well-designed and properly conducted randomized controlled trial (RCT) is one of the most important sources of evidence for clinical decision-making. Randomization will balance both measured and unmeasured variables between treated and untreated subjects. RCT can provide causal association between intervention and outcome, which is the key for clinicians to understand the underlying mechanisms for a pathologic condition. However, such experimental studies are often not feasible because of economical and ethical constraints [1]. Thus, clinical evidence is often shaped by observational studies, in which

Conflict of interest: None. * Corresponding author. Tel./fax: þ86-579-82552629. E-mail address: [email protected] (Z. Zhang). 0895-4356/$ - see front matter Ó 2014 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jclinepi.2014.02.018

however the treatment effect is often confounded by many measured and unmeasured factors. Many techniques have been developed to control these confounding factors, including stratification, matching, and multivariable regression analysis [2]. Propensity score (PS) analysis was developed in the 1980s and has been increasingly used in biomedical field [3e6]. It is defined as the conditional probability of receiving a treatment or exposure given a series of predefined covariates [7]. With conventional matching or stratification, only few covariates can be taken into account, whereas the PS technique is able to incorporate all measured confounding factors and assigned each subject a score based on the probability that one will receive treatment. PS can be used for adjustment, matching, weighting, and stratification [8]. Critical care studies are especially subjected to bias because a long list of baseline characteristics cannot be easily balanced, and there is large number of interventions

2

Z. Zhang et al. / Journal of Clinical Epidemiology

What is new?  The present study demonstrates that PS-based observational study is likely to report less beneficial effect of experimental treatment compared with RCTs in the area of critical care medicine.

other than experimental treatment being conducted in intensive care unit (ICU). Thus, it is of crucial importance to control confounding factors in observational studies, particularly when administrative data are used for analysis [9]. Thus, PS has found its way into the field of critical care medicine, and the number of publications involving PS has increased exponentially in recent years [10]. However, the validity of PS has long been debated, and it is unknown whether the result obtained by using PS is comparable with that obtained by RCTs. Thus, the present study aimed to compare the treatment effect for experimental intervention in PS-based observational studies vs. RCTs (or metaanalysis of RCTs) in critical care medicine.

2. Methods 2.1. Study selection Observational studies using PS in the field of critical care medicine were identified by searching PubMed from inception to April 2013. There was no language restriction. Searching strategies consisted terms related to critical care and PS and mortality: (((((critically ill[Title/Abstract]) OR critical care[Title/Abstract]) OR intensive care[Title/Abstract]) OR ICU[Title/Abstract]) AND propensity score[Title/Abstract]) AND mortality[Title/Abstract]. Studies were potentially eligible if they (1) were related to critical care medicine; (2) used PS as a technique to adjust for pretreatment variables; (3) involved human subject; and (4) reported mortality as an end point. Exclusion criteria were (1) studies that investigated risk factors for mortality (not treatment effect); (2) intervention was not in the field of critical care medicine, for instance, studies in cardiothoracic surgery were excluded; (3) PS studies that cannot be matched to an RCT; details for matching were described in the following; and (4) the reported effect size could not be matched to that in corresponding RCTs; for instance, PSbased study reported hazards ratio (HR) but the RCT reported odds ratio (OR). Each observational study using PS analysis was matched to one or more RCTs. Although the matching process was inherently subjective, every effort was made to match a PS-based study with RCTs in terms of population, intervention, comparison, and outcome (PICO) [11]. If more than one RCTs were identified, the effect sizes were combined by using metaanalytic approach with random-effects model [12].

-

(2014)

-

Data on the following aspects were abstracted from the PS-based observational studies: name of the first author, year of publication, sample size, the number of RCTs being matched, study design (eg, prospective or retrospective), techniques for the using of PS (eg, matching, weighting, adjustment, and stratification), the number of covariates used to obtain PS, type of the effect size, and topic area. If data were not explicitly reported, we would try to contact the contributing author for detailed information. 2.2. Statistical analysis The reported effect sizes included OR, relative risk (RR), and HR. If a study did not report OR and it used PS-matching technique, OR was calculated in the matched cohort. The effect sizes were compared between PS study and RCT (or meta-analysis of RCTs) by using binomial (sign) test to see whether one type of study design was more likely to report beneficial effect than the other. For studies that reported the number of survivors and nonsurvivors, we established a logistic regression model to calculate the relative effect size (ratio of OR, ROR) and associated 95% confidence interval (CI) [13]. The model was based on the equation: logit (p) 5 b0jb1Itjb2Itpjb3Ip, where p is the probability that an event is observed; It, Itp and Ip are variables denoting the effect of treatment (It 5 1 in treatment subjects, 0 otherwise), the treatmentePS interaction (Itp 5 1 in treated subjects in studies using PS, 0 otherwise), and the effect of PS design (Ip 5 1 in studies using PS, 0 otherwise); bs were parameters of the logistic regression model. ROR can be obtained from the estimated b2. ROR was obtained for each pair of matched PS study and RCTs. An ROR ! 1 indicates there is a greater benefit for experimental treatment in RCTs (or meta-analysis of RCTs) compared with the PS-based study; conversely, an ROR O 1 suggests that there is a greater benefit for experimental treatment in PS-based study. Finally, RORs of matched pairs were combined by using meta-analytic approach with random-effects model. We predefined that the RCT with the largest sample size was the ‘‘gold standard’’ for the real treatment effect, and sensitivity analysis was performed by restricting to RCTs with the largest sample size. All statistical analyses were performed using the software StataSE 11.2 (StataCorp, College Station, TX, USA). Two-tailed P ! 0.05 was considered to be statistically significant.

3. Results 3.1. Study selection and characteristics Fig. 1 shows the flow chart of study selection. Our initial search identified 161 potential studies. Among them, 109 studies were excluded because they were not critical care studies, investigating risk factors or not involving human subjects. The remaining 52 studies using PS were matched

Z. Zhang et al. / Journal of Clinical Epidemiology

-

(2014)

-

3

Fig. 1. Flow chart of study selection. PS, propensity score; RCT, randomized controlled trial.

for potential RCTs. A total of 32 studies were excluded because they could not be matched to an RCT. Finally, a total of 20 studies [14e33] were matched to at least one RCT, and they were included in subsequent analysis (see Appendix at www.jclinepi.com). Characteristics of included observational studies using PS are displayed in Table 1. The number of matched RCT ranged from one to as much as 20 (median: 6.5; interquartile range: 3e10.5). Four studies [14,15,22,29] were matched to one RCT, and another five [18,21,28,32,33] were matched to more than 10 RCTs. Observational studies using PS had larger sample size than RCTs (433 [181e873] vs. 150 [100e345]; P 5 0.03). Nine studies [14,17,21e24,27e29] were retrospective in design, and the remaining 11 [15,16,18e20,25,26,30e33] were

Table 1. Characteristics of included PS studies and matched RCTs Number of topics (PS studies) Study design (prospective/retrospective) Covariates used for establishing PS Techniques for using PS (N, %) Weighting Matching Adjustment Sample size of PS studies Matched RCTs per topic Sample size of RCTs

20 11/9 11 (8e24) 1 14 5 433 6.5 150

(5) (70) (25) (181e873) (3e10.5) (100e345)

Abbreviations: PS, propensity score; RCT, randomized controlled trial.

prospective. There were three techniques being used in included studies: 14 [15,17e22,24e27,30,32,33] used matching, five [16,23,28,29,31] used adjustment, and one [14] used weighting. A total of 18 [14,16e30,32,33] studies reported OR; the remaining two [15,31] used HR and RR, respectively. Included observational studies using PS represented varieties of critical care topics including continuous renal replacement therapy, mechanical ventilation, sedation, sepsis, cardiac arrest, resuscitation, and hemodynamic monitoring. 3.2. Comparison of effect size Effect sizes of PS studies and matched RCTs were displayed in Fig. 2. PS-based observational studies showed a greater benefit for the experimental treatment than the control in eight of the 20 comparisons (sign test P 5 0.503); and the remaining 12 comparisons showed greater beneficial effect for experimental treatment in RCTs. The difference between PS-based observational studies and RCTs were statistically significant in four comparisons (Fig. 3) [18,20,27,32]. In most topics the effect sizes were similar between PS-based studies and RCTs. A total of 13 studies reported OR and used matching technique, and they could be incorporated into logistic regression model. RORs derived from each matched pairs are displayed in Fig. 3. RORs were significantly less than one in four topics [18,20,27,32], suggesting that these RCTs had greater benefit for experimental treatments than

4

Z. Zhang et al. / Journal of Clinical Epidemiology

-

(2014)

-

Fig. 2. Effect sizes of propensity scoreebased studies and matched RCTs. Each comparison has two lines to represent the effect sizes and corresponding 95% confidence intervals. The first line represents the propensity scoreebased observational study and the second line represents the RCT(s). ES, effect size; CI, confidence interval; RCT, randomized controlled trial.

PS-based studies. Six comparisons [19,22,24e26,33] showed greater benefit for experimental treatments in RCTs compared with PS-based studies, but statistical significance was not reached. The remaining three comparisons [16,21,30] showed a trend toward greater benefit for experimental treatment in PS-based studies compared with RCTs, but statistical significance was not reached. In combination, the pooled ROR was 0.71 (95% CI: 0.63, 0.79; P 5 0.002), suggesting that RCTs were more likely to report beneficial effect for the experimental treatment than PS-based studies. Sensitivity analysis restricting to RCTs of the largest sample size showed similar results (ROR: 0.76; 95% CI: 0.52, 0.99; Fig. 4). Furthermore, meta-regression analysis by incorporating study design (prospective vs. retrospective), number of matched RCTs and the number of covariates for developing PS as confounding factors was performed. These

characteristics of PS studies were not associated with the outcomes (data not shown).

4. Discussion Our study showed that in critical care medicine, RCTs (or meta-analysis of RCTs) are more likely to report beneficial effect for experimental treatment than observational studies using PS. The result remains unchanged after adjustment for potential confounding factors such as the design of PS-based studies and the number of covariates used to derive PS. With important advances in technology and Internet, increasing number of clinical databases capturing patients with critical illness is becoming available to investigators [34]. Studies based on such databases are usually observational in nature, and thus PS-based analysis

Z. Zhang et al. / Journal of Clinical Epidemiology

-

(2014)

-

5

Fig. 3. Ratio of odds ratio (ROR) of 13 comparisons and their combined effect size. The pooled ROR was 0.71 (95% CI: 0.63, 0.79; P 5 0.002), suggesting that RCTs were more likely to report beneficial effect for the experimental treatment than PS-based studies. CI, confidence interval; PS, propensity score; RCT, randomized controlled trial.

has become more and more popular in critical care literature. In a systematic review, Gayat et al. [10] showed that the number of published articles using PS increased by 10 folders from the year 1998 to 2008. However, the validity of PS-based analysis has long been debated. Some contend that PS analysis can replicate the randomization process by creating a control group that is equivalent to treatment group on known pre-treatment factors, whereas others argue that it is impossible to include all confounding factors and missing some of them will result in significant bias. The treated and untreated arms may still be imbalanced on unmeasured characteristics [35,36]. By comparing with RCTs (or meta-analysis of RCTs), we found that PS-based analysis was less likely to report a beneficial effect. RCTs are conducted with strict study protocols, and they usually lack external validity. The fact that PS studies include ‘‘real life’’ patients and situations might per se explain the difference in the results. We proposed several potential explanations for the present findings. First, the study population is often highly selected in RCTs because of strict inclusion/exclusion criteria, and experimental treatment may be conducted with strict protocol when compared with the interventions used in routine clinical settings. Such strict design may increase the power of an RCT to detect a true beneficial effect. The strict inclusion/exclusion criteria in RCT may instead amplify the effect size, and when the interventions are applied to ‘‘real life’’ situations, like in observational studies, these effect sizes may be resized to their real extent. For instance, in the RCT investigating the efficacy of PAC (pulmonary

artery catheter) in high-risk surgical patients [37], the authors included only patients aged 60 years and above and those with American Society of Anesthesiologists class III or IV, whereas in the observational study by Schwann et al. [20], the inclusion/exclusion criteria are not so strict. Second, it is common in observational studies that patients in the treatment arm are more critically ill than those in the control arm. Although the aim of PS analysis is to balance pretreatment characteristics between experimental and control arms, the PS cannot account for covariates that are not measured. For instance, patients received albumin administrations are more critically ill at baseline, which partly explains why patients receiving albumin are more likely to die than those without [32]. Third, RCTs often use the intention-to-treat analysis, whereas observational studies perform the per-treated analysis. In intention-to-treat analysis, subjects initially assigned to the control group may cross to the treatment arm, but they are still analyzed as if they were in the control arm [38]. If the experimental treatment is potentially harmful to patient, crossing over of control subjects to the experimental arm will attenuate the harmful effect of the experimental intervention in RCTs. As a result, the RCT will report more beneficial effect for the experimental intervention. This can be seen in studies examining the efficacy of pulmonary artery catheter in cardiac patients [20]. Additionally, the disparity between PS studies and RCT (or meta-analysis of several RCTs) might be attributable to different estimands obtained from the two approaches. RCT aims to assess the benefit of an intervention in overall population, that is, what would be the

6

Z. Zhang et al. / Journal of Clinical Epidemiology

-

(2014)

-

Fig. 4. Sensitivity analysis restricting to RCTs of the largest sample sizes with emphasis on different areas of discipline. The result shows that the pooled ROR is 0.76, with a 95% confidence interval of 0.52, 0.99. CI, confidence interval; PS, propensity score; RCT, randomized controlled trial; ROR, ratio of odds ratio.

difference in the observed outcome if all the subjects in the cohort were allocated to the treatment vs. the same subjects allocated to the control? In contrast, observational studies using PS matching aims to estimate the average treatment effect on the treated population, that is, what would have been the outcome in the treated if they have been untreated? As demonstrated by Pirracchio et al. [39], PS matching for average treatment effect on the treated and approaches for average treatment effect in the entire population can yield substantially different treatment estimates. Dahabreh et al. [40] has conducted a systematic review in cardiology comparing the treatment effects in observational studies using PS vs. RCTs. In contrast to the present study, they showed that PS-based observational studies were more likely to report beneficial effect for experimental treatment than RCTs. Reasons for such difference are largely unknown based on the current analysis. However, the most important difference between the two studies lies in the spectrum of the study population. Dahabreh’s study restricts their study subjects to patients with coronary artery syndrome. The homogeneity reduces between-study variance, making the beneficial effect of experimental treatment more likely to be detected.

The strengths of the present study include that the analysis included component studies with mortality as outcome of interest. Mortality is a solid outcome that minimizes the possibility of outcome misclassification. In contrast, other outcomes such as length of ICU stay are more likely to be biased due to institutional policy and availability of ICU beds. Second, the study is the first to compare treatment effects of intervention in PS-based observational studies vs. RCTs (or meta-analysis of RCTs) in critical care medicine. However, several limitations should be kept in mind in interpreting our findings. Our study included heterogeneous study populations and the true treatment effect varied substantially across different topics. As shown in Fig. 4, areas including high-frequency oscillatory ventilation, duration of antibiotics, corticosteroid for sepsis, and activated protein C showed better effect size in RCTs than in PS studies; whereas areas such as PAC in surgical patients, transfusion in cardiac surgery, fish oil, and timing of tracheostomy showed better effect size in PS studies than in RCTs. Other areas of discipline showed similar effect sizes. Cautions should be raised in interpreting our result that was obtained by merging ROR from heterogeneous studies. To address this issue, we accounted for the between-study variance

Z. Zhang et al. / Journal of Clinical Epidemiology

by using the random-effects model to combine the relative effect sizes, assuming that the true relative effect sizes distributed across a certain range. Secondly, the matching process was inherently subjective and PS-based observational studies cannot be matched for RCTs exactly in terms of PICO. For instance, although mortality is a relatively solid outcome, it can be defined as ICU mortality, hospital mortality, or 28-day mortality. However, the matching algorithm is generally consistent with that performed in systematic review and meta-analysis. Third, the quality of PS studies is important in obtaining reliable results. Given the fact that RCTs are difficult to perform and very expensive, most of them are now built with great attention, thanks to expert in methodology. On the other hand, PS methods are now widely available in standard statistical softwares and can be used (sometimes badly) by non-experts. There is no validated method for the assessment of the quality of PS method in a single dimension as is done in RCTs by using Jadad score. In the present analysis, we included various characteristics into meta-regression to see if the combined ROR is dependent on the quality of PS studies. Fourth, our results may be biased by ‘‘gray literature’’ and it may affect differently on PS studies and RCTs. For instance, RCTs with negative findings are more likely to be reported because of their rigorous methodology, whereas observational studies are thought to be biased when the findings are non-significant. However, gray literature is difficult to evaluate in the current analysis. Finally, because the majority of PS-based studies were matched to more than one RCT, we combined RCTs of the same discipline by using meta-analytic approach. However, this approach is flawed in that it may potentially overestimate the treatment effect when the sample size of component study is small [41,42], or when the meta-analysis includes several studies stopped early for benefit [43]. In aggregate, our study demonstrates that PS-based observational study is likely to underestimate the beneficial effect of experimental treatment compared with RCTs (or meta-analysis of RCTs). However, the analysis is limited by the heterogeneity in disciplines of included component studies and the matching process is inherently subjective. With increasing publications of both PS-based observational studies and RCTs in critical care medicine, further epidemiologic study should be focused on more homogeneous topics.

Acknowledgments Z.Z. helped design the study, conduct the study, analyze the data, and write the manuscript and has seen the original study data, reviewed the analysis of the data, and approved the final manuscript. H.N. helped design the study and analyze the data and has seen the original study data, approved the final manuscript, and is the author responsible for archiving the study files. X.X. helped design the study

-

(2014)

-

7

and write the manuscript and has seen the original study data and approved the final manuscript.

Appendix Supplementary data Supplementary data related to this article can be found at http://dx.doi.org/10.1016/j.jclinepi.2014.02.018. References [1] Trojano M, Pellegrini F, Paolicelli D, Fuiani A, Di Renzo V. Observational studies: propensity score analysis of non-randomized data. Int MS J 2009;16(3):90e7. [2] Bradbury BD, Gilbertson DT, Brookhart MA, Kilpatrick RD. Confounding and control of confounding in nonexperimental studies of medications in patients with CKD. Adv Chronic Kidney Dis 2012; 19(1):19e26. [3] Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983;70:41e55. [4] Austin PC. A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003. Stat Med 2008;27(12): 2037e49. [5] Austin PC. Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: a systematic review and suggestions for improvement. J Thorac Cardiovasc Surg 2007;134:1128e35. [6] Gayat E, Thabut G, Christie JD, Mebazaa A, Mary JY, Porcher R. Within-center matching performed better when using propensity score matching to analyze multicenter survival data: empirical and Monte Carlo studies. J Clin Epidemiol 2013;66:1029e37. [7] Heinze G, J€uni P. An overview of the objectives of and the approaches to propensity score analyses. Eur Heart J 2011;32: 1704e8. [8] Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res 2011;46(3):399e424. [9] Suissa S, Garbe E. Primer: administrative health databases in observational studies of drug effectseadvantages and disadvantages. Nat Clin Pract Rheumatol 2007;3(12):725e32. [10] Gayat E, Pirracchio R, Resche-Rigon M, Mebazaa A, Mary JY, Porcher R. Propensity scores in intensive care and anaesthesiology literature: a systematic review. Intensive Care Med 2010;36(12): 1993e2003. [11] Schulz KF, Altman DG, Moher D, CONSORT Group. CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. Trials 2010;11:32.http://dx.doi.org/10.1186/1745-621511-32. [12] DerSimonian R, Kacker R. Random-effects model for meta-analysis of clinical trials: an update. Contemp Clin Trials 2007;28:105e14. [13] Sterne JA, J€uni P, Schulz KF, Altman DG, Bartlett C, Egger M. Statistical methods for assessing the influence of study characteristics on treatment effects in ‘meta-epidemiological’ research. Stat Med 2002; 21:1513e24. [14] Leite TT, Macedo E, Pereira SM, Bandeira SR, Pontes PH, Garcia AS, et al. Timing of renal replacement therapy initiation by AKIN classification system. Crit Care 2013;17(2):R62. [15] Jung B, Clavieras N, Nougaret S, Molinari N, Roquilly A, Cisse M, et al. Effects of etomidate on complications related to intubation and on mortality in septic shock patients treated with hydrocortisone: a propensity score analysis. Crit Care 2012;16(6):R224. [16] Bajwa EK, Malhotra CK, Thompson BT, Christiani DC, Gong MN. Statin therapy as prevention against development of acute respiratory

8

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

Z. Zhang et al. / Journal of Clinical Epidemiology distress syndrome: an observational study. Crit Care Med 2012;40(5): 1470e7.http://dx.doi.org/10.1097/CCM.0b013e3182416d7a. Bojan M, Gioanni S, Mauriat P, Pouard P. High-frequency oscillatory ventilation and short-term outcome in neonates and infants undergoing cardiac surgery: a propensity score analysis. Crit Care 2011; 15(5):R259. Sadique MZ, Grieve R, Harrison DA, Cuthbertson BH, Rowan KM. Is Drotrecogin alfa (activated) for adults with severe sepsis, costeffective in routine clinical practice? Crit Care 2011;15(5):R228. Choudhury G, Mandal P, Singanayagam A, Akram AR, Chalmers JD, Hill AT. Seven-day antibiotic courses have similar efficacy to prolonged courses in severe community-acquired pneumoniaea propensityadjusted analysis. Clin Microbiol Infect 2011;17(12):1852e8. Schwann NM, Hillel Z, Hoeft A, Barash P, M€ohnle P, Miao Y, et al. Lack of effectiveness of the pulmonary artery catheter in cardiac surgery. Anesth Analg 2011;113(5):994e1002. Torgersen C, Luckner G, Schr€oder DC, Schmittinger CA, Rex C, Ulmer H, et al. Concomitant arginine-vasopressin and hydrocortisone therapy in severe septic shock: association with mortality. Intensive Care Med 2011;37(9):1432e7. Argalious MY, Dalton JE, Mascha EJ, Cywinski JB, Clair DG. Association of red blood cell transfusion and postoperative outcomes after endovascular aortic repair. Semin Cardiothorac Vasc Anesth 2011; 15(1e2):49e55. van der Wal G, Brinkman S, Bisschops LL, Hoedemaekers CW, van der Hoeven JG, de Lange DW, et al. Influence of mild therapeutic hypothermia after cardiac arrest on hospital mortality. Crit Care Med 2011;39(1):84e8. Wohlmuth C, D€ unser MW, Wurzinger B, Deutinger M, Ulmer H, Torgersen C, et al. Early fish oil supplementation and organ failure in patients with septic shock from abdominal infections: a propensity-matched cohort study. JPEN J Parenter Enteral Nutr 2010;34(4):431e7. Khalid I, Doshi P, DiGiovine B. Early enteral nutrition and outcomes of critically ill patients treated with vasopressors and mechanical ventilation. Am J Crit Care 2010;19(3):261e8. Mounier R, Adrie C, Franc¸ais A, Garrouste-Orgeas M, Cheval C, Allaouchiche B, et al, OUTCOMEREA Study Group. Study of prone positioning to reduce ventilator-associated pneumonia in hypoxaemic patients. Eur Respir J 2010;35(4):795e804. Scales DC, Thiruchelvam D, Kiss A, Redelmeier DA. The effect of tracheostomy timing during critical illness on long-term survival. Crit Care Med 2008;36(9):2547e57. Moubarak P, Zilker S, Wolf H, Hofner B, Kneib T, K€uchenhoff H, et al. Activity-guided antithrombin III therapy in severe surgical sepsis: efficacy and safety according to a retrospective data analysis. Shock 2008;30(6):634e41. Raurich JM, Llompart-Pou JA, Iba~nez J, Frontera G, Perez O, Garcıa L, et al. Low-dose steroid therapy does not affect hemodynamic response in septic shock patients. J Crit Care 2007;22(4):324e9.

-

(2014)

-

[30] Dhainaut JF, Payet S, Vallet B, Franc¸a LR, Annane D, Bollaert PE, et al, PREMISS Study Group. Cost-effectiveness of activated protein C in real-life clinical practice. Crit Care 2007;11(5):R99. [31] Cho KC, Himmelfarb J, Paganini E, Ikizler TA, Soroko SH, Mehta RL, et al. Survival by dialysis modality in critically ill patients with acute kidney injury. J Am Soc Nephrol 2006;17(11):3132e8. [32] Vincent JL, Sakr Y, Reinhart K, Sprung CL, Gerlach H, Ranieri VM, ‘Sepsis Occurrence in Acutely Ill Patients’ Investigators. Is albumin administration in the acutely ill associated with increased mortality? Results of the SOAP study. Crit Care 2005;9(6):R745e54. [33] Sakr Y, Vincent JL, Reinhart K, Payen D, Wiedermann CJ, Zandstra DF, et al, Sepsis Occurrence in Acutely Ill Patients Investigators. Use of the pulmonary artery catheter is not associated with worse outcome in the ICU. Chest 2005;128:2722e31. [34] Cooke CR, Iwashyna TJ. Using existing data to address important clinical questions in critical care. Crit Care Med 2013;41(3):886e96. [35] Kuss O, Legler T, B€orgermann J. Treatments effects from randomized trials and propensity score analyses were similar in similar populations in an example from cardiac surgery. J Clin Epidemiol 2011; 64:1076e84. [36] Austin PC, Mamdani MM, Stukel TA, Anderson GM, Tu JV. The use of the propensity score for estimating treatment effects: administrative versus clinical data. Stat Med 2005;24:1563e78. [37] Sandham JD, Hull RD, Brant RF, Knox L, Pineo GF, Doig CJ, et al, Canadian Critical Care Clinical Trials Group. A randomized, controlled trial of the use of pulmonary-artery catheters in highrisk surgical patients. N Engl J Med 2003;348:5e14. [38] Hollis S, Campbell F. What is meant by intention to treat analysis? Survey of published randomised controlled trials. BMJ 1999;319: 670e4. [39] Pirracchio R, Carone M, Rigon MR, Caruana E, Mebazaa A, Chevret S. Propensity score estimators for the average treatment effect and the average treatment effect on the treated may yield very different estimates. Stat Methods Med Res 2013 Nov 6. [Epub ahead of print]. [40] Dahabreh IJ, Sheldrick RC, Paulus JK, Chung M, Varvarigou V, Jafri H, et al. Do observational studies using propensity score methods agree with randomized trials? A systematic comparison of studies on acute coronary syndromes. Eur Heart J 2012;33:1893e901. [41] Thorlund K, Imberger G, Walsh M, Chu R, Gluud C, Wetterslev J, et al. The number of patients and events required to limit the risk of overestimation of intervention effects in meta-analysisea simulation study. PLoS One 2011;6(10):e25491. [42] Zhang Z, Xu X, Ni H. Small studies may overestimate the effect sizes in critical care meta-analyses: a meta-epidemiological study. Crit Care 2013;17(1):R2. [43] Bassler D, Montori VM, Briel M, Glasziou P, Walter SD, Ramsay T, et al. Reflections on meta-analyses involving trials stopped early for benefit: is there a problem and if so, what is it? Stat Methods Med Res 2013;22(2):159e68.